Abstract:Label noise is ubiquitous in real-world scenarios, posing a practical challenge to supervised models due to its effect in hurting the generalization performance of deep neural networks. Existing methods primarily employ the sample selection paradigm and usually rely on dataset-dependent prior knowledge (\eg, a pre-defined threshold) to cope with label noise, inevitably degrading the adaptivity. Moreover, existing methods tend to neglect the class balance in selecting samples, leading to biased model performance. To this end, we propose a simple yet effective approach named \textbf{SED} to deal with label noise in a \textbf{S}elf-adaptiv\textbf{E} and class-balance\textbf{D} manner. Specifically, we first design a novel sample selection strategy to empower self-adaptivity and class balance when identifying clean and noisy data. A mean-teacher model is then employed to correct labels of noisy samples. Subsequently, we propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples. Finally, we additionally employ consistency regularization on selected clean samples to improve model generalization performance. Extensive experimental results on synthetic and real-world datasets demonstrate the effectiveness and superiority of our proposed method. The source code has been made available at <a class="link-external link-https" href="https://github.com/NUST-Machine-Intelligence-Laboratory/SED" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper primarily addresses the challenges faced by deep learning models when dealing with data containing noisy labels and proposes a method called SED (Self-adaptive and class-balance D). Specifically, the paper aims to solve the following key issues: 1. **Handling the problem of noisy labels**: In real-world applications, training data often comes with noisy labels, which can severely affect the generalization performance of deep neural networks. Existing methods typically rely on sample selection strategies and often require dataset-specific prior knowledge (such as predefined thresholds), which limits their adaptability. 2. **Improving adaptability and class balance**: Existing methods often overlook the issue of class balance when dealing with noisy labels, leading to biased model performance. Additionally, these methods lack sufficient adaptability in distinguishing between clean and noisy data. 3. **Limitations of sample selection and reweighting**: Traditional sample selection methods often use cross-entropy loss to distinguish between clean and noisy samples, but this approach usually requires predefined discard rates or thresholds and tends to ignore inter-class imbalance issues. While sample reweighting methods can mitigate the impact of noisy samples, they often require additional prior information. To address the above issues, the paper proposes the SED method, which mainly includes the following aspects: - **Adaptive and class-balanced sample selection**: By designing new sample selection strategies, using predicted probabilities to identify clean and noisy samples, and employing global and local thresholds to promote class balance, the adaptability of sample selection is improved. - **Label correction**: Using a mean teacher model to correct the labels of noisy samples. - **Adaptive and class-balanced sample reweighting**: Assigning different weights to noisy samples based on the confidence of label correction to mitigate the negative impact of imbalanced label correction. - **Consistency regularization**: Applying consistency regularization to the selected clean samples to further enhance the model's generalization ability. Experimental results show that the SED method achieves significant results on both synthetic and real-world datasets, demonstrating its effectiveness and superiority.

Foster Adaptivity and Balance in Learning with Noisy Labels

Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

Noise is the Fatal Poison: A Noise-aware Network for Noisy Dataset Classification

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Learning to Detect Noisy Labels Using Model-Based Features

Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels

On Better Detecting and Leveraging Noisy Samples for Learning with Severe Label Noise

Learning from Noisy Labels with Decoupled Meta Label Purifier

Label-noise learning via uncertainty-aware neighborhood sample selection

Adaptive Textual Label Noise Learning Based on Pre-trained Models

Dynamic training for handling textual label noise

Learning with Feature-Dependent Label Noise: A Progressive Approach

Combating Label Noise With A General Surrogate Model For Sample Selection

Mitigating Memorization in Sample Selection for Learning with Noisy Labels

Self-Supervised Noisy Label Learning for Source-Free Unsupervised Domain Adaptation.

Decoding class dynamics in learning with noisy labels

Learning With Noisy Labels Over Imbalanced Subpopulations

Multi-Level Consistency Learning for Source-Free Model Adaptation

Estimating Per-Class Statistics for Label Noise Learning