K*-Means: an Effective and Efficient K-Means Clustering Algorithm
Jianpeng Qi,Yanwei Yu,Lihong Wang,Jinglei Liu,Yingjie Wang
DOI: https://doi.org/10.1109/bdcloud-socialcom-sustaincom.2016.46
IF: 1.938
2017-01-01
International Journal of Distributed Sensor Networks
Abstract:K-means is a widely used clustering algorithm in field of data mining across different disciplines in the past fifty years. However, k-means heavily depends on the position of initial centers, and the chosen starting centers randomly may lead to poor quality of clustering. Motivated by this, this paper proposes an optimized k-means clustering method along with three optimization principles named k*-means. First, we propose a hierarchical optimization principle initialized by k* cluster centers (k* > k) to reduce the risk of randomly seeds selection, and then utilize proposed top-n method to merge the nearest clusters associated with the shortest n edges in each round until the number of clusters reaches at k. Second, we propose a cluster pruning strategy for improving efficiency of k-means by omitting the farther clusters to shrink the adjustable searching space for each point in each iteration. Third, we implement an optimized update theory to optimize the k-means iteration updating, which leverages moved points updating instead of recalculating mean and SSE of cluster to minimize computation cost. Our comprehensive experimental studies, using 2 synthetic datasets and 4 real world datasets from the UCI Machine Learning Repository, demonstrate that our method outperforms state-of-the-art methods in both effectiveness and efficiency.