Abstract:Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the scores of all the points in the dataset are typically (micro) averaged into a single value. An alternative path, however, that is rarely employed, is to average first at the cluster level and then (macro) average across clusters. As we illustrate in this work with a synthetic example, the typical micro-averaging strategy is sensitive to cluster imbalance while the overlooked macro-averaging strategy is far more robust. By investigating macro-Silhouette further, we find that uniform sub-sampling, the only available strategy in existing libraries, harms the measure's robustness against imbalance. We address this issue by proposing a per-cluster sampling method. An experimental study on eight real-world datasets is then used to analyse both coefficients in two clustering tasks.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to explore and address the aggregation strategy issues of the Silhouette coefficient in clustering evaluation, particularly for imbalanced clustering datasets. Specifically, the paper focuses on the following two research questions: 1. **Is micro-averaging sensitive to clustering imbalance?** - The authors demonstrate through synthetic data experiments that micro-averaging can produce misleading results when dealing with imbalanced clustering. This is because the micro-averaging strategy is influenced by larger clusters, while smaller clusters are ignored. 2. **Is uniform sampling suitable for macro-averaging, or is its robustness to clustering imbalance affected?** - The authors find that existing libraries only implement uniform sampling, which can cause the smallest clusters to disappear during sampling in extremely imbalanced datasets, thus affecting the results of macro-averaging. To address this, the authors propose a new sampling method based on each cluster to improve the robustness of macro-averaging. ### Main Contributions 1. **Comparison of Two Aggregation Strategies**: - The authors compare the two aggregation strategies, micro-averaging and macro-averaging, and demonstrate the issues of micro-averaging on imbalanced datasets. 2. **Introduction of a Cluster-based Sampling Method**: - The authors propose a new sampling method based on each cluster, which is more suitable for macro-averaging and can better handle imbalanced clustering datasets. 3. **Quantification of Micro-averaging Sensitivity on Imbalanced Synthetic Data**: - The authors analyze the performance of micro-averaging on imbalanced synthetic data through experiments and validate the advantages of macro-averaging on two real-world imbalanced datasets. ### Experimental Setup and Results - **Synthetic Data Experiments**: - The authors created a synthetic dataset containing 4 Gaussian clusters and simulated imbalance by increasing the number of points in one cluster. The results show that micro-averaging significantly increases under imbalance, while macro-averaging remains stable. - **Real-world Dataset Experiments**: - The authors used 8 different types of real-world datasets, including numerical, time series, and images. The experimental results indicate that macro-averaging outperforms micro-averaging in highly imbalanced datasets. ### Conclusion Through experimental and theoretical analysis, the authors demonstrate the robustness and superiority of macro-averaging in handling imbalanced clustering datasets and propose a new cluster-based sampling method to further improve the effectiveness of macro-averaging. These findings are of significant importance for clustering evaluation and practical applications.

Revisiting Silhouette Aggregation

Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis

Distributed Silhouette Algorithm: Evaluating Clustering on Big Data

Deep Clustering Using the Soft Silhouette Score: Towards Compact and Well-Separated Clusters

Sample Weighting: an Inherent Approach for Outlier Suppressing Discriminant Analysis

Resampling and averaging coordinates on data

Cluster Metric Sensitivity to Irrelevant Features

Silhouette Representation and Matching for 3d Pose Discrimination - A Comparative Study

Interpretable label-free self-guided subspace clustering

Mean skewness measures

A New Similarity Combining Reconstruction Coefficient with Pairwise Distance for Agglomerative Clustering

CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering

Multi-view clustering indicator learning with scaled similarity

On the Use of Relative Validity Indices for Comparing Clustering Approaches

Selection of single cell clustering methodologies through rank aggregation of multiple performance measures

SCM Enables Improved Single-Cell Clustering by Scoring Consensus Matrices

A Comparative Study on the Use of Correlation Coefficients for Redundant Feature Elimination

Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification?

DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets