An Efficient Density-Based Clustering for Multi-Dimensional Database
Lieliang Zhang,Zhiyang Li,Weijiang Liu,Wenyu Qu,Yinan Wu
DOI: https://doi.org/10.1109/iccss.2017.8091440
2017-01-01
Abstract:Cluster analysis aims at classifying data elements into different categories according to their similarity. It is a common task in data mining and useful in various field including pattern recognition, machine learning, information retrieval and so on. As an extensive studied area, many clustering methods are proposed in literature. Among them, some methods are focused on mining clusters with arbitrary shapes. However, when dealing with large-scale and multi-dimensional data, there is still a need for an efficient and versatile clustering method to identify these arbitrary shapes that may be embedded in these multi-dimensional space. In this paper, we propose a density-based clustering algorithm that adopts a divide-and-conquer strategy. To handle large-scale and multi-dimensional data, we first divide the data by grid cells. It is very efficient in large-scale cases where other algorithms often fail. Moreover, rather than tuning the grid cell width, we present a way to automatically determine the grid cell width. Then, we propose a flood-filling like algorithm to identify the clusters with arbitrary shapes over these grid cells. Finally, extensive experiments are conducted in both synthetic databases and real-world databases, showing that the proposed algorithm efficiently finds accurate clusters in both low-dimensional and multi-dimensional databases.