Improved K-means algorithm based on clustering criterion function

ZHANG Xuefeng,ZHANG Guizhen,LIU Peng
DOI: https://doi.org/10.3778/j.issn.1002-8331.2011.11.035
2011-01-01
Abstract:The criterion function used in K-means algorithm is the sum of the squared error,which may not work well for dataset containing clusters with different sizes and densities.In this study,the criterion function is improved by being defined as the sum of the weighted standard deviation,and the weight is the ratio of the number of points in each cluster to the whole points.The way each point being assigned to the centroid in the K-means algorithm is also modified:Instead of being assigned to the closest centroid,each point is assigned to the centroid which has minimum weighted distance.Experiments on simulation datasets show that the improved K-means algorithm significantly enhances the clustering quality by reducing the probability of misclassifying the points of big sparse clusters to its neighboring compact clusters.Experiments on UCI datasets show that the improved algorithm can obtain more compact cluster.Therefore,the improved K-means algorithm is effective.
What problem does this paper attempt to address?