An Improved Density-Based Cluster Analysis Method Combining Genetic Algorithm and Data Sampling for Large-Scale Datasets

Ye Zonglin,Cao Hui,Wang Miaomiao,Zhang Yanbin
2013-01-01
Abstract:This paper proposes an improved density-based cluster analysis method combining genetic algorithm and data sampling for large-scale datasets. Firstly, the proposed method selects the samples from the original dataset to obtain a sampling dataset. Secondly, the density based spatial clustering of applications with noise (DBSCAN) with the genetic algorithm is performed on the sampling dataset to determine the neighborhood of a given radius (Eps) and the minimum number (MinPts), where the Minkowski score is used as the fitness function. Finally, the obtained MinPts and Eps are transformed by considering the scales of the original dataset and the sampling dataset. With the new parameters, DBSCAN is performed on the original dataset. Three datasets of UCI Machine Learning Repository are used in the experiments. The experimental results verify that the proposed method has higher clustering capability and the selection of the parameters is easier and more effective.
What problem does this paper attempt to address?