CURE-NS: a hierarchical clustering algorithm with new shrinking scheme

Yuntao Qian,Qingsong Shi,Qi Wang
DOI: https://doi.org/10.1109/ICMLC.2002.1174512
2002-01-01
Abstract:CURE (clustering using representatives) is an efficient clustering algorithm for large databases, which is more robust to outliers compared with other clustering methods, and identifies clusters having non-spherical shapes and wide variances in size. CURE employs a fixed number or representative points to describe the cluster, and the set of representative points are first chosen randomly, and then are shrunk toward the mean of cluster. The shrinking operation plays a key role in CURE, which is used for weakening the effect of outliers. However, we found that the shrinking scheme of CURE is dependent on a hidden assumption of spherical shape of cluster, therefore CURE has difficulties in dealing with databases having specific shapes. In this paper, CURE-NS (CURE with new shrinking scheme) is proposed to overcome this problem, which uses the difference of density values of the representative points to determine the direction and distance of shrinking. Our shrinking scheme has nothing to do with the shape of cluster. A range of experiments demonstrate that CURE-NS has better clustering performance than CURE.
What problem does this paper attempt to address?