Abstract:Because of the large variety of cluster validation indices (CVIs), choosing the most suitable index is challenging. We assessed several CVIs using artificial binary data sets. Only a few CVI performed as expected with noisy data. Tau and silhouette widths proved to be the best geometric CVIs both for equal and unequal cluster sizes. Among non‐geometric indices, Crispness and OptimClass performed best. Aims Different clustering methods often classify the same data set differently. Selecting the "best" clustering solution from alternatives is possible with cluster validation indices. Because of the large variety of cluster validation indices (CVIs), choosing the most suitable index concerning the data set and clustering algorithms is challenging. We aim to assess different internal clustering validation indices. Methods Artificial binary data sets with equal‐ and unequal‐sized well‐separated a priori clusters were simulated and three levels of noise were then added. Twenty replications of each of the six types of data sets (two group sizes × three levels of noise) were created and analyzed by three clustering algorithms with Jaccard dissimilarity. Twenty‐seven clustering validation indices are evaluated including both geometric and non‐geometric indices. Results Although, in theory, all CVIs could differentiate between good and wrong classifications, only a few perform as expected with noisy data. Tau and silhouette widths proved to be the best geometric CVIs both for equal and unequal cluster sizes. Among non‐geometric indices, crispness and OptimClass performed best. Conclusion We recommend using these best‐performing CVIs. We suggest plotting the CVI value against the number of clusters because the lack of a sharp peak means that the position of the maximum is uncertain.

On the Use of Relative Validity Indices for Comparing Clustering Approaches

On the Index of Cluster Validity

A New Separation Measure for Improving the Effectiveness of Validity Indices

Extended multivariate comparison of 68 cluster validity indices. A review

A comparative study of different cluster validity indexes

An Internal Cluster Validity Index Using a Distance-based Separability Measure

Quantitative evaluation of internal cluster validation indices using binary data sets

New criteria for evaluating the validity of clustering

Graph Sensitive Indices for Comparing Clusterings

A Bayesian cluster validity index

Performance evaluation of some clustering algorithms and validity indices

Volume and Surface Area Based Cluster Validity Index

Efficient synthetical clustering validity indexes for hierarchical clustering

Stable Hierarchical Clustering Analysis Based on New Designed Cluster Validity Index

Normalised clustering accuracy: An asymmetric external cluster validity measure

Comparing high dimensional partitions, with the Coclustering Adjusted Rand Index

An Unsupervised and Robust Validity Index for Clustering Analysis

Particle Swarm Optimization Based Clustering: A Comparison of Different Cluster Validity Indices

From A-to-Z Review of Clustering Validation Indices

A Distance-based Separability Measure for Internal Cluster Validation