Density-based multiscale analysis for clustering in strong noise settings

Zhang Tiantian,Yuan Bo
DOI: https://doi.org/10.1007/978-3-319-63004-5_3
2017-01-01
Abstract:Finding clustering patterns in data is challenging when clusters can be of arbitrary shapes and the data contains high percentage (e.g., 80%) of noise. This paper presents a novel technique named density-based multiscale analysis for clustering (DBMAC) that can conduct noise-robust clustering without any strict assumption on the shapes of clusters. Firstly, DBMAC calculates the r-neighborhood statistics with different r (radius) values. Next, instead of trying to find a single optimal r value, a set of radius values appropriate for separating “clustered” objects and “noisy” objects is identified, using a formal statistical method for multimodality test. Finally, the classical DBSCAN is employed to perform clustering on the subset of data with significantly less amount of noise. Experiment results confirm that DBMAC is superior to classical DBSCAN in strong noise settings and also outperforms the latest technique SkinnyDip when the data contains arbitrarily shaped clusters. © Springer International Publishing AG 2017.
What problem does this paper attempt to address?