Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce

Aiguo Liu,Shuli Zou,Taorong Qiu,Xiaoming Bai
2016-01-01
Abstract:First of all, in order to solve the problem with varying clustering results from selecting randomly the initial k clustering centers in the k-medoids algorithm, we propose combining the k-medoids algorithm and the density-based clustering algorithm. The improved k-medoids algorithm uses the density-based clustering algorithm to generate automatically the best appropriate k-clustering centers that are used as the initial representation seeds in the k-medoids algorithm. Secondly, considering the k-medoids algorithm does not scale well for large data sets, a parallel processing procedure of the improved k-medoids algorithm based on MapReduce computing model is designed and implemented on Hadoop platform. The parallel processing of the improved k-medoids algorithm is tested on some data sets. And experimental results show that the clustering effectiveness of the improved k-medoids algorithm becomes better and the designed parallel processing can do scale well for large data sets.
What problem does this paper attempt to address?