A Fast Subspace Clustering Algorithm Based on Pattern Similarity

Yanglan Gan,Jihong Guan,Hao Wang
DOI: https://doi.org/10.1109/FSKD.2007.24
2007-01-01
Abstract:Traditional clustering models define similarity by distance over dimensions. However, distance functions are not always adequate in capturing correlations among the objects. Pattern-based clustering can discover this kind of clusters. But state-of-the-art pattern-based clustering methods are inefficient and haven't criteria to evaluate the quality of clusters. This paper presents a novel pattern similarity-based subspace clustering with the pattern tree (PPSC for short) that offers these capabilities. The method uses new evaluation criteria to discover best clusters, which enables user to find clusters according to different needs. Meanwhile, observing the analogy between mining frequent itemsets and discovering subspace clusters around random points, we apply the pattern-tree to determine subspace by scanning the database once, so it can perform efficiently in large datasets.
What problem does this paper attempt to address?