Scalable clustering by aggregating representatives in hierarchical groups

Wen-Bo Xie,Zhen Liu,Debarati Das,Bin Chen,Jaideep Srivastava
DOI: https://doi.org/10.1016/j.patcog.2022.109230
IF: 8
2022-12-08
Pattern Recognition
Abstract:Appropriately handling the scalability of clustering is a long-standing challenge for the study of clustering techniques and is of fundamental interest to researchers in the community of data mining and knowledge discovery. In comparison to other clustering methods, hierarchical clustering demonstrates better interpretability of clustering results but poor scalability while handling large-scale data. Thus, more comprehensive studies on this problem need to be conducted. This paper develops a new scalable hierarchical clustering model called Election Tree, which can detect the most representative point for each sub-cluster via the process of node election in split data and adjust the members in sub-clusters by the operations of node merging and swap. Extensive experiments on real-world datasets reveal that the proposed computational framework has better clustering accuracy as opposed to the competing baseline methods. Meanwhile, with respect to the scalability tests on incremental synthetic datasets, the results show that the new model has a significantly lower time consumption than the state-of-the-art hierarchical clustering models such as PERCH, GRINCH, SCC and other classic baselines.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?