An Improved Clustering Algorithm Based on Density Distribution Function.

Jianhao Tan,Jing Zhang,Weixiong Li
DOI: https://doi.org/10.5539/cis.v3n3p23
2010-01-01
Computer and Information Science
Abstract:Characteristics and disadvantages of traditional density-based clustering algorithms are deeply investigated; the present research status of density-based clustering algorithms is discussed; an improved clustering algorithm based on density distribution function is put forward. K nearest neighbor (KNN) is used to measure the density of each point; a local maximum density point is defined as the center point. By means of local scale, classification is extended from the center point. For each point there is a procedure to determine whether it is a core point by a radius scale factor. The classification is extended once again from the core point until the density descends to the given ratio of the density of the center point. Several algorithm examples are given and the algorithm is experimentally compared with the grid-shared nearest neighbor (GNN) clustering algorithm, on the clustering accuracy ratio and efficiency. The tests show that the improved algorithm greatly reduces the sensitivity of density-based clustering algorithms to parameters, improves the clustering effect of the high-dimensional data sets with uneven density distribution, and enhances the clustering accuracy and efficiency.
What problem does this paper attempt to address?