Prediction Consistency Regularization for Learning with Noise Labels Based on Contrastive Clustering

Xinkai Sun,Sanguo Zhang,Shuangge Ma
DOI: https://doi.org/10.3390/e26040308
IF: 2.738
2024-03-31
Entropy
Abstract:In the classification task, label noise has a significant impact on models' performance, primarily manifested in the disruption of prediction consistency, thereby reducing the classification accuracy. This work introduces a novel prediction consistency regularization that mitigates the impact of label noise on neural networks by imposing constraints on the prediction consistency of similar samples. However, determining which samples should be similar is a primary challenge. We formalize the similar sample identification as a clustering problem and employ twin contrastive clustering (TCC) to address this issue. To ensure similarity between samples within each cluster, we enhance TCC by adjusting clustering prior to distribution using label information. Based on the adjusted TCC's clustering results, we first construct the prototype for each cluster and then formulate a prototype-based regularization term to enhance prediction consistency for the prototype within each cluster and counteract the adverse effects of label noise. We conducted comprehensive experiments using benchmark datasets to evaluate the effectiveness of our method under various scenarios with different noise rates. The results explicitly demonstrate the enhancement in classification accuracy. Subsequent analytical experiments confirm that the proposed regularization term effectively mitigates noise and that the adjusted TCC enhances the quality of similar sample recognition.
physics, multidisciplinary
What problem does this paper attempt to address?
The paper mainly addresses the issue of the impact of label noise on model performance in machine learning classification tasks. Specifically, the paper proposes a new method called Prediction Consistency Regularization (PCR) to mitigate the negative effects of noisy labeled datasets during the neural network training process. The paper points out that in classification tasks, label noise significantly affects model performance, primarily by disrupting the prediction consistency among similar samples, thereby reducing classification accuracy. To alleviate this problem, the authors' proposed method is divided into two parts: 1. **Improvement of Twin Contrastive Clustering (TCC)**: - TCC is used to identify which samples in the dataset are similar. TCC is a contrastive learning framework that can generate representations and cluster based on these representations. - The paper improves TCC to better utilize label information, i.e., considering label consistency during the clustering process to improve clustering quality. This includes constructing an "alignment matrix" to reflect the relationship between categories and clusters, and introducing a confidence threshold to filter out label information that might be misleading due to noise. 2. **Prototype-based Regularization based on Clustering Results**: - Based on the improved TCC clustering results, the paper proposes a prototype regularization term. This regularization term aims to enhance the prediction consistency within clusters by penalizing the difference between the prediction distribution of samples within the same cluster and the cluster prototype. - The cluster prototype is obtained by the weighted average of the prediction distributions of samples within each cluster, where the weights are determined by the confidence of the samples belonging to the cluster. Experimental results show that the proposed regularization method effectively improves classification accuracy under different noise rates. Additionally, subsequent analysis experiments verified that the proposed regularization term can indeed effectively mitigate the impact of noise, and the improved TCC enhances the quality of similar sample identification.