A modified parallel k-means clustering with improved initial centers

Yuecheng Yu,Jiandong Wang,Guansheng Zheng,Bin Gu
2010-01-01
Journal of Computational Information Systems
Abstract:Parallel k-means is an efficient algorithm for distributed data sets. As classic k-means clustering, Parallel k-means also suffers from the serious drawback that its performance heavily depends on the initial centroids[1]. To improve the quality of initial centers, we propose a modified parallel k-means algorithm for distributed data clustering. Our method includes two steps. At the first step, local clustering results of all distributed sites are used in the form of apriori knowledge and will guide the aggregation of local clusters. On the following step, standard parallel k-means are carried out in the whole data set by using the centroids of the aggregation clusters as the initial centers. The proposed parallel clustering methods are tested on artificial data sets and nature data sets. The results show that our method is more effective than the standard parallel k-means with random initial centers. © 2010 Binary Information Press.
What problem does this paper attempt to address?