Clustering Algorithm Based on Grid Density and Distance Information Characteristics

Wei-di Dai,Lu Zhang,Wen-jun Wang,Yue-xian Hou
DOI: https://doi.org/10.3321/j.issn:1000-565X.2009.04.004
2009-01-01
Abstract:When disposing of a real data set with skewed data distribution using most grid-and density-based clustering algorithms, effective clustering cannot be obtained due to the monotonic search employed in the algorithms. In order to solve this problem, a new clustering algorithm GDD based on grid density and distance is proposed. In GDD, the data space is divided into many grid cells and a transition function related to the distance from the current clustering center is constructed. Then, the density transition ratios of grid cells in the local area are compared with the computed transition function values of the current grid cell to determine whether the current cluster should be extended. Moreover, by using a transition function, some experiments are made with real and synthetic data sets. The results show that the proposed algorithm which is insensitive to noise data, can discover clusters with arbitrary shape, with a time complexity linear to grid number, and that the algorithm is suitable for the clustering of real large-scale data sets.
What problem does this paper attempt to address?