An Enhanced Agglomerative Fuzzy K-Means Clustering Method with Mapreduce Implementation on Hadoop Platform

Ruixin Zhang,Yinglin Wang
DOI: https://doi.org/10.1109/pic.2014.6972387
2014-01-01
Abstract:In this Paper, an enhanced agglomerative fuzzy K-Means clustering algorithm with the MapReduce implementation is proposed. In this algorithm, an initial center selection method is introduced to improve the accuracy and increase the convergence speed of the agglomerative fuzzy k-means algorithm. Then, a MapReduce implementation based on Apache Hadoop is presented to increase the scalability for large scale datasets. Experiments were respectively conducted on a synthetic data set, the WINE dataset from UCI Repository and a randomly generated large dataset. The experimental results show that the proposed algorithm can identify true cluster number and produce accurate result with good scalability on large dataset.
What problem does this paper attempt to address?