A Hierarchical Clustering Algorithm Based on K-Means with Constraints

GuoYan Hang,DongMei Zhang,Jiadong Ren,ChangZhen Hu
DOI: https://doi.org/10.1109/icicic.2009.18
2009-01-01
Abstract:Hierarchical clustering is one of the most important tasks in data mining. However, the existing hierarchical clustering algorithms are time-consuming, and have low clustering quality because of ignoring the constraints. In this paper, a Hierarchical Clustering Algorithm based on K-means with Constraints (HCAKC) is proposed. In HCAKC, in order to improve the clustering efficiency, Improved Silhouette is defined to determine the optimal number of clusters. In addition, to improve the hierarchical clustering quality, the existing pairwise must-link and cannot-link constraints are adopted to update the cohesion matrix between clusters. Penalty factor is introduced to modify the similarity metric to address the constraint violation. The experimental results show that HCAKC has lower computational complexity and better clustering quality compared with the existing algorithm CSM.
What problem does this paper attempt to address?