Optimization of k-means clustering algorithm in hadoop distributed computing framework

Ling SONG,Yun-feng QI,Dong-yang QI
DOI: https://doi.org/10.3969/j.issn.1001-7445.2014.05.015
2014-01-01
Abstract:Classic distributed k-means clustering algorithm randomly selects the initial clustering centers.With many times iterations, it is easy to make low clustering efficiency, heavy network traf-fic, and the unstable clustering results.To solve these problems, an improved distributed k-means clustering algorithm is put forward.The algorithm selects the initial clustering centers by partitioning the data set, and calculating k classification blocks of most intensive attribute, to ensure the cluste-ring centers'representative, and then it reduces the number of iterations and improves the efficiency of clustering.Through the experiments on the Hadoop distributed platform, the results show that the improved algorithm can reduce the number of iteration and convergence time.
What problem does this paper attempt to address?