The Significance of Kappa and F-score in Clustering Ensemble: A Comprehensive Analysis

Jie Yan,Xin Liu,Ji Qi,Tianyan You,Zhongyuan Zhang
DOI: https://doi.org/10.21203/rs.3.rs-3005071/v1
2023-01-01
Abstract:Clustering ensemble techniques have gained significant attention due to their ability to enhance partition results’ accuracy and robustness. Selective clustering ensemble (SCE) and weighted clustering ensemble (WCE) methods further improve performance by selecting and weighting base partitions or clusters based on their diversity and stability. However, striking a balance between these two factors remains challenging. The primary difficulty lies in evaluating the quality of base partitions and clusters. Existing evaluation criteria, such as normalized mutual information (NMI) and its variants, suffer from inherent flaws, including symmetric problem, context meaning problem, and the disregard for small clusters’ importance. To address these limitations, this paper proposes a novel evaluation method that utilizes kappa and F-score. We introduce a new SCE method that employs kappa to select informative base partitions and utilizes F-score to assign weights to clusters based on their stability. Empirical validation on real datasets demonstrates the effectiveness and efficiency of the proposed approach.
What problem does this paper attempt to address?