Parallel PK-means Algorithm on Meteorological Data Using MapReduce

薛胜军,潘吴斌
DOI: https://doi.org/10.3963/j.issn.1671-4431.2012.12.028
2012-01-01
Wuhan Ligong Daxue Xuebao/Journal of Wuhan University of Technology
Abstract:With the improvement of meteorological information technology, meteorological data increases exponentially. However, due to the rapid growth of data, K-means algorithm cannot easy to meet the actual application. Based on characteristics of meteorological data, a parallel K-means algorithm (PK-means) based on MapReduce is proposed in this paper, the distance between each point and cluster is calculated and the new cluster ID to each point is assigned by Map function, new cluster centers are calculated by Reduce function, then iterative calculation and only the distance between center point and points in relevant cluster is calculated in intermediate iterations. The experiment result shows that the improved parallel K-means algorithm has better computing ability and scalability.
What problem does this paper attempt to address?