Abstract:Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels, whose main challenge comes from how to deal with the inaccurate supervision caused by wrong labels. Existing works either take the label correction or sample selection paradigm to involve more samples with accurate labels into the training process. In this paper, we propose a simple yet effective sample selection algorithm, termed as Pairwise Similarity Distribution Clustering~(PSDC), to divide the training samples into one clean set and another noisy set, which can power any of the off-the-shelf semi-supervised learning regimes to further train networks for different downstream tasks. Specifically, we take the pairwise similarity between sample pairs to represent the sample structure, and the Gaussian Mixture Model~(GMM) to model the similarity distribution between sample pairs belonging to the same noisy cluster, therefore each sample can be confidently divided into the clean set or noisy set. Even under severe label noise rate, the resulting data partition mechanism has been proved to be more robust in judging the label confidence in both theory and practice. Experimental results on various benchmark datasets, such as CIFAR-10, CIFAR-100 and Clothing1M, demonstrate significant improvements over state-of-the-art methods.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper primarily addresses the issue of how to train deep neural networks in the presence of noisy labels. Specifically: 1. **Objective of Noisy Label Learning**: - Utilize a large amount of data with noisy labels to train deep neural networks. - The main challenge is how to handle inaccurate supervision caused by incorrect labels. 2. **Problems with Existing Methods**: - Existing methods either adopt label correction approaches or sample selection methods to include more samples with accurate labels in the training process. - These methods still struggle to improve supervision quality in cases of severe noisy label rates. 3. **Proposed Method**: - A simple and effective sample selection algorithm called **Pairwise Similarity Distribution Clustering (PSDC)** is proposed. - The training samples are divided into a clean set and a noisy set, and these data are further used to train the network for different downstream tasks. - The sample structure is represented by calculating the pairwise similarity between sample pairs, and a Gaussian Mixture Model (GMM) is used to model the similarity distribution between sample pairs belonging to the same noise cluster. - This data partitioning mechanism is proven to be more robust both theoretically and practically, even under severe label noise rates. 4. **Main Contributions**: - A new PSDC method is proposed to improve the accuracy of data partitioning through pairwise sample structure and Gaussian Mixture Model. - Clear theoretical analysis of Jensen-Shannon divergence, cross-entropy criterion, and Gaussian Mixture Model is provided, demonstrating the method's broad noise tolerance range. - Extensive experiments on CIFAR-10, CIFAR-100, and Clothing1M datasets were conducted, achieving state-of-the-art results.

Pairwise Similarity Distribution Clustering for Noisy Label Learning

FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling

Robust Noisy Label Learning via Two-Stream Sample Distillation

Class2Simi: A Noise Reduction Perspective on Learning with Noisy Labels

Learning With Non-Uniform Label Noise: A Cluster-Dependent Weakly Supervised Approach.

Learning with Neighbor Consistency for Noisy Labels

Neighborhood Collective Estimation for Noisy Label Identification and Correction

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Label Correction Using Contrastive Prototypical Classifier for Noisy Label Learning.

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

Feature-Induced Label Distribution for Learning with Noisy Labels

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

DST: Data Selection and joint Training for Learning with Noisy Labels

An Efficient Noisy Label Learning Method with Semi-supervised Learning: An Efficient Noisy Label Learning Method with Semi-supervised Learning

Learning with noisy labels using collaborative sample selection and contrastive semi-supervised learning

On Better Detecting and Leveraging Noisy Samples for Learning with Severe Label Noise

Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels

Learning with Noisy Labels by Semantic and Feature Space Collaboration

Decoding class dynamics in learning with noisy labels

Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels