Revisiting Silhouette Aggregation

John Pavlopoulos,Georgios Vardakas,Aristidis Likas
2024-06-23
Abstract:Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the scores of all the points in the dataset are typically (micro) averaged into a single value. An alternative path, however, that is rarely employed, is to average first at the cluster level and then (macro) average across clusters. As we illustrate in this work with a synthetic example, the typical micro-averaging strategy is sensitive to cluster imbalance while the overlooked macro-averaging strategy is far more robust. By investigating macro-Silhouette further, we find that uniform sub-sampling, the only available strategy in existing libraries, harms the measure's robustness against imbalance. We address this issue by proposing a per-cluster sampling method. An experimental study on eight real-world datasets is then used to analyse both coefficients in two clustering tasks.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore and address the aggregation strategy issues of the Silhouette coefficient in clustering evaluation, particularly for imbalanced clustering datasets. Specifically, the paper focuses on the following two research questions: 1. **Is micro-averaging sensitive to clustering imbalance?** - The authors demonstrate through synthetic data experiments that micro-averaging can produce misleading results when dealing with imbalanced clustering. This is because the micro-averaging strategy is influenced by larger clusters, while smaller clusters are ignored. 2. **Is uniform sampling suitable for macro-averaging, or is its robustness to clustering imbalance affected?** - The authors find that existing libraries only implement uniform sampling, which can cause the smallest clusters to disappear during sampling in extremely imbalanced datasets, thus affecting the results of macro-averaging. To address this, the authors propose a new sampling method based on each cluster to improve the robustness of macro-averaging. ### Main Contributions 1. **Comparison of Two Aggregation Strategies**: - The authors compare the two aggregation strategies, micro-averaging and macro-averaging, and demonstrate the issues of micro-averaging on imbalanced datasets. 2. **Introduction of a Cluster-based Sampling Method**: - The authors propose a new sampling method based on each cluster, which is more suitable for macro-averaging and can better handle imbalanced clustering datasets. 3. **Quantification of Micro-averaging Sensitivity on Imbalanced Synthetic Data**: - The authors analyze the performance of micro-averaging on imbalanced synthetic data through experiments and validate the advantages of macro-averaging on two real-world imbalanced datasets. ### Experimental Setup and Results - **Synthetic Data Experiments**: - The authors created a synthetic dataset containing 4 Gaussian clusters and simulated imbalance by increasing the number of points in one cluster. The results show that micro-averaging significantly increases under imbalance, while macro-averaging remains stable. - **Real-world Dataset Experiments**: - The authors used 8 different types of real-world datasets, including numerical, time series, and images. The experimental results indicate that macro-averaging outperforms micro-averaging in highly imbalanced datasets. ### Conclusion Through experimental and theoretical analysis, the authors demonstrate the robustness and superiority of macro-averaging in handling imbalanced clustering datasets and propose a new cluster-based sampling method to further improve the effectiveness of macro-averaging. These findings are of significant importance for clustering evaluation and practical applications.