Yuyin Zhou,Xianhang Li,Fengze Liu,Qingyue Wei,Xuxi Chen,Lequan Yu,Cihang Xie,Matthew P. Lungren,Lei Xing
Abstract:Deep neural networks have shown great success in representation learning. However, when learning with noisy labels (LNL), they can easily overfit and fail to generalize to new data. This paper introduces a simple and effective method, named Learning to Bootstrap (L2B), which enables models to bootstrap themselves using their own predictions without being adversely affected by erroneous pseudo-labels. It achieves this by dynamically adjusting the importance weight between real observed and generated labels, as well as between different samples through meta-learning. Unlike existing instance reweighting methods, the key to our method lies in a new, versatile objective that enables implicit relabeling concurrently, leading to significant improvements without incurring additional costs.
L2B offers several benefits over the baseline methods. It yields more robust models that are less susceptible to the impact of noisy labels by guiding the bootstrapping procedure more effectively. It better exploits the valuable information contained in corrupted instances by adapting the weights of both instances and labels. Furthermore, L2B is compatible with existing LNL methods and delivers competitive results spanning natural and medical imaging tasks including classification and segmentation under both synthetic and real-world noise. Extensive experiments demonstrate that our method effectively mitigates the challenges of noisy labels, often necessitating few to no validation samples, and is well generalized to other tasks such as image segmentation. This not only positions it as a robust complement to existing LNL techniques but also underscores its practical applicability. The code and models are available at <a class="link-external link-https" href="https://github.com/yuyinzhou/l2b" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to train more robust deep - learning models in the presence of label noise. Specifically, the paper proposes a new method - Learning to Bootstrap (L2B), which aims to reduce the impact of label noise on model performance by dynamically adjusting the weights between real labels and generated labels, as well as the weights between different samples. This method can not only effectively utilize the valuable information in contaminated data instances, but also effectively combat label noise without the need for an additional validation set, thereby improving the generalization ability and accuracy of the model.
### Key Point Analysis
1. **Problem Background**:
- Deep neural networks have achieved remarkable success on large - scale, high - quality datasets.
- However, datasets in the real world often contain label noise, which can lead to model over - fitting and thus affect its generalization ability.
2. **Limitations of Existing Methods**:
- **Loss Correction Methods**: These methods correct the loss function by estimating the noise matrix, but they usually require assumptions about the noise model and are difficult to implement.
- **Sample Re - weighting Methods**: These methods reduce the influence of noisy samples by re - assigning sample weights, but they may ignore or underestimate some training data in high - noise scenarios.
- **Pseudo - label Methods**: These methods use the pseudo - labels predicted by the network to recalibrate the labels, but static pseudo - label weights may lead to over - fitting and insufficient label correction.
3. **Core Ideas of the L2B Method**:
- **Dynamically Adjusting Weights**: L2B dynamically adjusts the weights between real labels and pseudo - labels, as well as the weights between different samples through a meta - learning framework.
- **Implicit Re - labeling**: L2B realizes an implicit re - labeling process by re - weighting different loss terms, without the need to explicitly generate new training targets.
- **No Need for a Validation Set**: L2B can avoid relying on a clean validation set by generating meta - sets online, enhancing the practicality and flexibility of the method.
4. **Experimental Results**:
- L2B has been extensively experimented on multiple datasets, including natural image datasets (such as CIFAR - 10, CIFAR - 100) and medical image datasets (such as ISIC2019), as well as real - world noise datasets (such as Clothing 1M).
- The experimental results show that L2B can significantly improve the performance of the model under various noise levels, especially in high - noise situations.
### Formula Analysis
- **Pseudo - label Generation**:
\[
y_{\text{pseudo}}^i=\arg \max_{l = 1,\ldots,L}P(x_i,\theta)
\]
- **Traditional Bootstrapping Loss**:
\[
\theta^*=\arg \min_{\theta}\sum_{i = 1}^N L(F(x_i,\theta),\beta y_{\text{real}}^i+(1 - \beta)y_{\text{pseudo}}^i)
\]
- **L2B Optimization Objective**:
\[
\theta^*(\alpha,\beta)=\arg \min_{\theta}\sum_{i = 1}^N\left(\alpha_i L(F(x_i,\theta),y_{\text{real}}^i)+\beta_i L(F(x_i,\theta),y_{\text{pseudo}}^i)\right)
\]
- **Meta - learning Update**:
\[
(\alpha^*,\beta^*)=\arg \min_{\alpha,\beta\geq0}\frac{1}{M}\sum_{i = 1}^M L(F(x_v^i,\theta^*(\alpha,\beta)),y_v^i)
\]
### Conclusion
L2B effectively solves the label noise problem by dynamically adjusting the weights of labels and samples.