Hierarchical Subtrees Agglomerative Clustering Algorithms

LI Yu-jian
DOI: https://doi.org/10.3969/j.issn.0254-0037.2006.05.012
2006-01-01
Abstract:In order to solve the problem that Traditional Hierarchical Agglomerative Clustering Algorithms (HACA) may produce a nonunique binary tree as the clustering result of a same dataset, this paper presents Hierarchical Subtrees Agglomerative Clustering Algorithm (HSACA), the basic idea of which is to find maximal θ-distant subtrees in a minimal spanning tree of the data set and merge its vertex set. HSACA can merge many objects into a cluster in each step, and its clustering result is usually a multiple tree. This paper proves in theory that the multiple tree generated by HSACA is unique for a dataset without considering the branchy orders, and shows in computer simulations that the multiple tree describes a more reasonable clustering result than the binary tree generated by traditional HACA if there are many equidistant pairs of points in the data set.
What problem does this paper attempt to address?