A New Fast Minimum Spanning Tree-Based Clustering Technique

Xiaochun Wang,Xia Li Wang,Jihua Zhu
DOI: https://doi.org/10.1109/ICDMW.2014.139
2014-01-01
Abstract:Due to its important applications in data mining, many techniques have been developed for clustering. For today's real-world databases which typically have millions of items with many thousands of fields, resulting in datasets that range in size into terabytes, many traditional clustering techniques have more and more restricted capabilities and novel approaches that are computationally efficient have become more and more popular. In this paper, a new efficient approach to graph-theoretical clustering using a minimum spanning tree representation of a dataset is proposed which consists of two-phases. In the first phase, we modify the standard Prim's algorithm in such a way that an efficient construction of such a tree can be realized based on k-nearest neighbor search mechanisms, during which a new edge weight is defined to maximize the intra-cluster similarity and minimize the inter-cluster similarity of the data set. In the second phase, based on the intuition that the data points are closer in the same cluster than in different clusters, the longest edges in the minimum spanning tree obtained from the first phase are removed to form clusters as the standard minimum spanning tree-based clustering algorithms do. Experiments on synthetic as well as real data sets have been conducted to show that our proposed approach works well with respect to the state-of-the-art methods.
What problem does this paper attempt to address?