Abstract

The K-means clustering method is one of the simplest, unsupervised learning algorithms that divides the units of a given data set into a predetermined number of distinct clusters. This method, like other iterative methods, performs a cluster analysis based on initial center points which are randomly chosen. With the help of these initial center points, clusters belonging to similar data sets are determined and these randomly selected initial points may lead biased results. In addition, determining which of the results obtained from different initial centers is more valid is another main and important problem of K-mean cluster algorithm. To understand the existence of the initial center problem of K-mean clustering method, a fictitious study has been created. In the fictitious study, to determine and show the existence of the problem, we decided to partition the data set into two and three clusters with all possible initial centers from the data set. Since initial centers can get values from anywhere, we developed a simple algorithm to construct new initial centers, which are out of the data set. The new initial centers constructed are so near to units, which belongs to the data set, and the others are far away. In the second part of the fictitious study, we cluster the same data set with new (progressed) initial centers and examine the results from this analysis and we found different and new cluster sets which we could not construct with initial centers from the data set. In addition, we aimed to show there will be some different cluster groups, when we start the method with initial centers from the data-set and with initial centers from outside the data-set or with initial center points combining inside and outside.