5New density clustering algorithm based on MapReduce

Ningjia QIU,Bin LI,Peng WANG,Huamin YANG,Weiqi WANG
2017-01-01
Journal of Computer Applications
Abstract:To resolve the problems of poor clustering quality caused by experience parameters and low exution efficiency of the spatial clustering algorithm named DBSCAN (Density-Based Spatial Clustering of Applications with Noise),the paper proposed an adaptive DBSCAN algorithm programming framework based on genetic algorithm and MapReduce.It can reasonably ensure data serialization by analyzing the similarity and dissimilarity in data set by the computing capability of Hadoop.Firstly,minPts-the threshold of condensed space,Eps-the size of scanning radius,were set by the genetic algorithm,and parallel programming framework was combined with MapReduce,then the obtained threshold was used to achieve parallel clustering.Finally,the experimental results show the execution efficiency of improved algorithm (GADBSCANMR) was increased about 3 times when treating 10000 records of data compared with the original DBSCAN algorithm;and the quality of clustering was imporved by about 10%.It is proved that the improved algorithm provides a more accurate implementation method for threshold determination of DBSCAN algorithm,and realizes the specific calculation parallel process,which provides a new research method to solve the efficiency and quality problem of clustering algorithm.
What problem does this paper attempt to address?