Jump-teaching: Ultra Efficient and Robust Learning with Noisy Label

Kangye Ji,Fei Cheng,Zeqing Wang,Bohu Huang
2024-08-27
Abstract:Sample selection is the most straightforward technique to combat label noise, aiming to distinguish mislabeled samples during training and avoid the degradation of the robustness of the model. In the workflow, $\textit{selecting possibly clean data}$ and $\textit{model update}$ are iterative. However, their interplay and intrinsic characteristics hinder the robustness and efficiency of learning with noisy labels: 1) The model chooses clean data with selection bias, leading to the accumulated error in the model update. 2) Most selection strategies leverage partner networks or supplementary information to mitigate label corruption, albeit with increased computation resources and lower throughput speed. Therefore, we employ only one network with the jump manner update to decouple the interplay and mine more semantic information from the loss for a more precise selection. Specifically, the selection of clean data for each model update is based on one of the prior models, excluding the last iteration. The strategy of model update exhibits a jump behavior in the form. Moreover, we map the outputs of the network and labels into the same semantic feature space, respectively. In this space, a detailed and simple loss distribution is generated to distinguish clean samples more effectively. Our proposed approach achieves almost up to $2.53\times$ speedup, $0.46\times$ peak memory footprint, and superior robustness over state-of-the-art works with various noise settings.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of training data with noisy labels (Noisy Label, NL), particularly in deep learning scenarios where models are prone to overfitting these noisy labels, leading to a decline in generalization performance. Specifically, the paper proposes a method called "Jump-teaching," whose core objective is to improve training efficiency while ensuring model robustness. The following are the specific problems the paper attempts to solve: 1. **Reducing Error Accumulation**: Traditional sample selection methods can lead to error accumulation due to selection bias during the iterative process. Jump-teaching reduces this accumulation error through a jump-update strategy. 2. **Improving Training Efficiency**: Many existing noisy label handling methods rely on dual-network structures, which, while improving robustness, sacrifice efficiency. Jump-teaching uses only a single network and achieves an efficient and robust learning process through the jump-update strategy. 3. **Enhancing Loss Representation**: To more accurately select clean samples, the paper proposes semantic loss decomposition, which decomposes the information in the loss function into a richer form, thereby better identifying noisy samples. Through the above methods, Jump-teaching not only improves the model's robustness to noisy labels but also significantly enhances training speed and memory usage efficiency, making it particularly suitable for real-time applications and scenarios with high-security requirements.