Scaling Up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique

Guan Ji-hong,Zhou Shui-geng,Bian Fu-ling,He Yan-xiang
DOI: https://doi.org/10.1007/bf03160286
2001-01-01
Abstract:Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling-based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering largescale spatial databases.
What problem does this paper attempt to address?