Parallel implementing k-means clustering algorithm using MapReduce programming mode

Jiang Xiaoping,Li Chenghua,Xiang Wen,Zhang Xinfang,Yan Haitao
DOI: https://doi.org/10.13245/j.hust.2011.s1.031
2011-01-01
Abstract:How to implement the k-means clustering algorithm using MapReduce programming mode was studied. The distance between each point and each cluster was calculated and new center ID was assigned to each point in the Map function. All the points of the same key value (current cluster ID) were sent to a single reducer and get the new cluster centroids for the next MapReduce Job. The experiments on the Hadoop platform showns basically linear speedup with an increasing number of node computers and good scalability.
What problem does this paper attempt to address?