A K-means clustering with optimized initial center based on Hadoop platform

Kunhui Lin,Xiang Li,Zhongnan Zhang,Jiahong Chen
DOI: https://doi.org/10.1109/ICCSE.2014.6926466
2014-01-01
Abstract:With the explosive growth of data, the traditional clustering algorithms running on separate servers can not meet the demand. To solve the problem, more and more researchers implement the traditional clustering algorithms on the cloud computing platforms, especially for K-means clustering. But, few researchers pay attention to the K-means clustering structure, and most of researchers optimized the model of the cloud computing platform to raise the computing speed of K-means clustering. However the problem of instability caused by the random initial centers still exists. In this paper, we propose a K-means clustering algorithm with optimized initial centers based on data dimensional density. This method avoids the deficiency of the random initial centers and improves the stability of the K-means clustering. The experimental results show that the approach achieves a good performance on K-means, and improves the accuracy of K-means clustering on the test set.
What problem does this paper attempt to address?