A Hierarchical Clustering Algorithm Based on Noise Removal

Dongdong Cheng,Qingsheng Zhu,Jinlong Huang,Quanwang Wu,Lijun Yang
DOI: https://doi.org/10.1007/s13042-018-0836-3
2018-01-01
International Journal of Machine Learning and Cybernetics
Abstract:Noise is irrelevant or meaningless data and hinders most types of data analysis. The existing clustering algorithms seldom take the noise points into consideration and cannot detect arbitrary-shaped clusters. This paper presents a Hierarchical Clustering algorithm Based on Noise Removal (HCBNR). It is robust against noise points and good at discovering clusters with arbitrary shapes. In this work, natural neighbor-based density is applied to remove noise points in a data set firstly. Then we construct a saturated neighbor graph on the rest points, and a novel modularity-based graph partitioning algorithm is used to divide the graph into small clusters. Finally, the small clusters are repeatedly merged according to a novel similarity metric between clusters until the desired cluster number is obtained. The experimental results on synthetic data sets and real data sets show that our method can accurately identify noise points and obtain better clustering results than existing clustering algorithms when discovering arbitrary-shaped clusters.
What problem does this paper attempt to address?