Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Zeya Wang,Chenglong Ye
2024-03-22
Abstract:Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic approach to applying clustering validity indices in deep clustering contexts. Experiments show that this framework aligns better with external validation measures, effectively reducing the misguidance from the improper use of clustering validity indices in deep learning.
Machine Learning
What problem does this paper attempt to address?
The paper mainly discusses the problems in deep clustering evaluation and proposes a new theoretical framework and strategy to solve these problems. In deep clustering, data is projected into a low-dimensional embedding space for partitioning through deep neural networks, while traditional clustering evaluation metrics may be ineffective in high-dimensional space. The paper identifies two key issues: the curse of dimensionality on the original data and the unreliable comparison of clustering results in different embedding spaces by different models. The paper presents the following main contributions: 1. Theoretical proof: It is shown that calculating clustering validity measures using both the original high-dimensional data and individual embedding data does not guarantee the consistency of comparing different clustering results with ground truth. In addition, theoretical properties of acceptable embedding spaces in all embedding spaces are determined. 2. Evaluation strategy: Based on theoretical analysis, a strategy is proposed to identify acceptable embedding spaces during the evaluation process. The robustness of the evaluation results is enhanced by combining the internal measure scores of selected embedding spaces. The paper also demonstrates the effectiveness of the proposed framework in scenarios such as hyperparameter tuning, cluster number selection, and checkpoint selection through experiments, proving their importance in evaluating deep clustering methods. Additionally, the paper points out that although embedding data can alleviate the curse of dimensionality, different embedding spaces generated by different models may affect the comparison of internal measures, so it is necessary to properly validate these measures in deep learning applications.