A fast DBSCAN algorithm using a bi-directional HNSW index structure for big data

Shaoyuan Weng,Zongwen Fan,Jin Gou
DOI: https://doi.org/10.1007/s13042-024-02104-8
2024-03-05
International Journal of Machine Learning and Cybernetics
Abstract:The Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is one of the most popular and effective density-based clustering algorithms at present. Although it can effectively identify clusters and noise points of arbitrary shapes, it is very difficult to efficiently address the tasks with large scale data. The time complexity of the DBSCAN is where its main computation time lies in - neighbor range query, which becomes the bottleneck of DBSCAN performance. To solve this problem, we propose a simple fast DBSCAN algorithm, called bh -DBSCAN, using a bi-directional HNSW index structure to improve the efficiency of DBSCAN by reducing redundant - neighbor range queries. Specifically, we first distinguish a point's property (core point or border point). Next, we apply the filtNoise algorithm to filter the noise points that without core points in . Finally, we utilized the MergeCore algorithm to merge the cluster of border points in it's core neighbor points. The experimental results show that our proposed algorithm could greatly improve the clustering efficiency without losing much accuracy based on the datasets tested.
computer science, artificial intelligence
What problem does this paper attempt to address?