KBAC:K-means Based Adaptive Clustering for Massive Dataset

XU Xiao-min,XIAO Yang-hua
DOI: https://doi.org/10.3969/j.issn.1000-1220.2012.10.028
2012-01-01
Abstract:One of the main drawbacks of K-means clustering algorithm is that the number of clusters should be specified by users.In most of the real application scenarios,it is impossible for the user to provide the number of clusters beforehand.On the other hand,its potential parallelizability provides a way to cluster massive dataset efficiently.In this paper,we proposed KBAC algorithm which adopted K-means algorithm as pre-clustering procedure to cluster massive data adaptively under MapReduce cloud framework.The main idea of the algorithm is to reduce the problem of clustering on vector space to community detection problem on graph.Theoretical and experimental results indicated that KBAC algorithm could enhance the clustering quality and efficiency under cloud.
What problem does this paper attempt to address?