Foster Adaptivity and Balance in Learning with Noisy Labels

Mengmeng Sheng,Zeren Sun,Tao Chen,Shuchao Pang,Yucheng Wang,Yazhou Yao
2024-07-03
Abstract:Label noise is ubiquitous in real-world scenarios, posing a practical challenge to supervised models due to its effect in hurting the generalization performance of deep neural networks. Existing methods primarily employ the sample selection paradigm and usually rely on dataset-dependent prior knowledge (\eg, a pre-defined threshold) to cope with label noise, inevitably degrading the adaptivity. Moreover, existing methods tend to neglect the class balance in selecting samples, leading to biased model performance. To this end, we propose a simple yet effective approach named \textbf{SED} to deal with label noise in a \textbf{S}elf-adaptiv\textbf{E} and class-balance\textbf{D} manner. Specifically, we first design a novel sample selection strategy to empower self-adaptivity and class balance when identifying clean and noisy data. A mean-teacher model is then employed to correct labels of noisy samples. Subsequently, we propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples. Finally, we additionally employ consistency regularization on selected clean samples to improve model generalization performance. Extensive experimental results on synthetic and real-world datasets demonstrate the effectiveness and superiority of our proposed method. The source code has been made available at <a class="link-external link-https" href="https://github.com/NUST-Machine-Intelligence-Laboratory/SED" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses the challenges faced by deep learning models when dealing with data containing noisy labels and proposes a method called SED (Self-adaptive and class-balance D). Specifically, the paper aims to solve the following key issues: 1. **Handling the problem of noisy labels**: In real-world applications, training data often comes with noisy labels, which can severely affect the generalization performance of deep neural networks. Existing methods typically rely on sample selection strategies and often require dataset-specific prior knowledge (such as predefined thresholds), which limits their adaptability. 2. **Improving adaptability and class balance**: Existing methods often overlook the issue of class balance when dealing with noisy labels, leading to biased model performance. Additionally, these methods lack sufficient adaptability in distinguishing between clean and noisy data. 3. **Limitations of sample selection and reweighting**: Traditional sample selection methods often use cross-entropy loss to distinguish between clean and noisy samples, but this approach usually requires predefined discard rates or thresholds and tends to ignore inter-class imbalance issues. While sample reweighting methods can mitigate the impact of noisy samples, they often require additional prior information. To address the above issues, the paper proposes the SED method, which mainly includes the following aspects: - **Adaptive and class-balanced sample selection**: By designing new sample selection strategies, using predicted probabilities to identify clean and noisy samples, and employing global and local thresholds to promote class balance, the adaptability of sample selection is improved. - **Label correction**: Using a mean teacher model to correct the labels of noisy samples. - **Adaptive and class-balanced sample reweighting**: Assigning different weights to noisy samples based on the confidence of label correction to mitigate the negative impact of imbalanced label correction. - **Consistency regularization**: Applying consistency regularization to the selected clean samples to further enhance the model's generalization ability. Experimental results show that the SED method achieves significant results on both synthetic and real-world datasets, demonstrating its effectiveness and superiority.