Abstract:Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic approach to applying clustering validity indices in deep clustering contexts. Experiments show that this framework aligns better with external validation measures, effectively reducing the misguidance from the improper use of clustering validity indices in deep learning.

What problem does this paper attempt to address?

The paper mainly discusses the problems in deep clustering evaluation and proposes a new theoretical framework and strategy to solve these problems. In deep clustering, data is projected into a low-dimensional embedding space for partitioning through deep neural networks, while traditional clustering evaluation metrics may be ineffective in high-dimensional space. The paper identifies two key issues: the curse of dimensionality on the original data and the unreliable comparison of clustering results in different embedding spaces by different models. The paper presents the following main contributions: 1. Theoretical proof: It is shown that calculating clustering validity measures using both the original high-dimensional data and individual embedding data does not guarantee the consistency of comparing different clustering results with ground truth. In addition, theoretical properties of acceptable embedding spaces in all embedding spaces are determined. 2. Evaluation strategy: Based on theoretical analysis, a strategy is proposed to identify acceptable embedding spaces during the evaluation process. The robustness of the evaluation results is enhanced by combining the internal measure scores of selected embedding spaces. The paper also demonstrates the effectiveness of the proposed framework in scenarios such as hyperparameter tuning, cluster number selection, and checkpoint selection through experiments, proving their importance in evaluating deep clustering methods. Additionally, the paper points out that although embedding data can alleviate the curse of dimensionality, different embedding spaces generated by different models may affect the comparison of internal measures, so it is necessary to properly validate these measures in deep learning applications.

Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Deep Clustering and Visualization for End-to-End High-Dimensional Data Analysis.

CVAP: Validation for Cluster Analyses

Deep Discriminative Clustering Analysis

From A-to-Z Review of Clustering Validation Indices

An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?

Deep clustering framework review using multicriteria evaluation

Internal Purity: A Differential Entropy Based Internal Validation Index for Crisp and Fuzzy Clustering Validation

Deep Embedding Clustering Based on Residual Autoencoder

Internal Purity: A Differential Entropy based Internal Validation Index for Clustering Validation

A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions

Deep Clustering: A Comprehensive Survey

Deep Embedded K-Means Clustering

Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models

Deep image clustering: A survey

Deep Divergence-Based Approach to Clustering

Deep Density-based Image Clustering

External validation measures for K-means clustering

A new approach for evaluating internal cluster validation indices

Deep Reinforcement Clustering

Deep Discriminative Latent Space for Clustering