Dboost: A Fast Algorithm for Dbscan-Based Clustering on High Dimensional Data

Yuxiao Zhang,Xiaorong Wang,Bingyang Li,Wei Chen,Tengjiao Wang,Kai Lei
DOI: https://doi.org/10.1007/978-3-319-31750-2_20
2016-01-01
Abstract:DBSCAN is a classic density-based clustering technique, which is well known in discovering clusters of arbitrary shapes and handling noise. However, it is very time-consuming in density calculation when facing high dimensional data, which makes it inefficient in many areas, such as multi-document summarization, product recommendation, etc. Therefore, how to efficiently calculate the density on high dimensional data becomes one key issue for DBSCAN-based clustering technique. In this paper, we propose a fast algorithm for DBSCAN-based clustering on high dimensional data, named Dboost. In our algorithm, a ranked retrieval technique adaption named \(WAND^\#\) is novelly applied to improving the density calculations without accuracy loss, and we further improve this acceleration by reducing the invoking times of \(WAND^\#\). Experiments were conducted on wire voltage data, Netflix dataset and microblog corpora. The results showed that an acceleration of over 50 times were achieved on wire voltage data and Netflix dataset, and 100 more times can be expected on microblog data.
What problem does this paper attempt to address?