Tackling Instance-Dependent Label Noise with Dynamic Distribution Calibration

Manyi Zhang,Yuxin Ren,Zihao Wang,Chun Yuan

DOI: https://doi.org/10.48550/arXiv.2210.05126

2022-10-11

Abstract:Instance-dependent label noise is realistic but rather challenging, where the label-corruption process depends on instances directly. It causes a severe distribution shift between the distributions of training and test data, which impairs the generalization of trained models. Prior works put great effort into tackling the issue. Unfortunately, these works always highly rely on strong assumptions or remain heuristic without theoretical guarantees. In this paper, to address the distribution shift in learning with instance-dependent label noise, a dynamic distribution-calibration strategy is adopted. Specifically, we hypothesize that, before training data are corrupted by label noise, each class conforms to a multivariate Gaussian distribution at the feature level. Label noise produces outliers to shift the Gaussian distribution. During training, to calibrate the shifted distribution, we propose two methods based on the mean and covariance of multivariate Gaussian distribution respectively. The mean-based method works in a recursive dimension-reduction manner for robust mean estimation, which is theoretically guaranteed to train a high-quality model against label noise. The covariance-based method works in a distribution disturbance manner, which is experimentally verified to improve the model robustness. We demonstrate the utility and effectiveness of our methods on datasets with synthetic label noise and real-world unknown noise.

Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the distribution shift problem faced in machine learning in the presence of instance - dependent label noise. Specifically, instance - dependent label noise means that the process of label errors directly depends on the data instance itself, which will lead to a significant difference in the distribution between the training data and the test data, thereby impairing the generalization ability of the trained model. This type of noise is more complex than instance - independent label noise because it causes more severe distribution shift problems, resulting in poor performance of the model on the test data. To meet this challenge, the paper proposes a dynamic distribution calibration strategy, assuming that before the influence of label noise, the feature levels of each category conform to a multivariate Gaussian distribution. Label noise will generate outliers and shift the Gaussian distribution. During the training process, in order to calibrate these shifted distributions, the paper proposes two methods: 1. **Mean - based method**: Robust mean estimation is carried out by means of recursive dimension reduction, which theoretically ensures that a high - quality model can be trained against label noise. 2. **Covariance - based method**: By introducing interference to change the empirical covariance of the given data, thereby increasing the diversity of the training data and reducing over - fitting to the shifted distribution. Experiments verify that this method can improve the robustness of the model. These two methods aim to improve the generalization ability of the model on the test data by calibrating the data distribution affected by label noise. The paper proves the effectiveness and practicality of the proposed methods through experiments on data sets with synthetic label noise and real - world unknown noise.

Tackling Instance-Dependent Label Noise with Dynamic Distribution Calibration

Tackling Instance-Dependent Label Noise Via a Universal Probabilistic Model.

Instance-dependent Label Distribution Estimation for Learning with Label Noise

Instance-dependent Label-noise Learning under a Structural Causal Model

Instance-specific Label Distribution Regularization for Learning with Label Noise

Learning with Feature-Dependent Label Noise: A Progressive Approach

Feature-Induced Label Distribution for Learning with Noisy Labels

Robust Training under Label Noise by Over-parameterization

A Model-Agnostic Approach for Learning with Noisy Labels of Arbitrary Distributions

Confidence Scores Make Instance-dependent Label-noise Learning Possible

Uncertainty-Aware Learning against Label Noise on Imbalanced Datasets

Dynamic training for handling textual label noise

Co-LDL: A Co-Training-Based Label Distribution Learning Method for Tackling Label Noise

Learning with Bounded Instance- and Label-dependent Label Noise

Class-Wise Denoising for Robust Learning under Label Noise

When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration Method

A Convergence Path to Deep Learning on Noisy Labels

Generative Calibration of Inaccurate Annotation for Label Distribution Learning

Causality Encourages the Identifiability of Instance-Dependent Label Noise

Safeguarded Dynamic Label Regression for Generalized Noisy Supervision

Safeguarded Dynamic Label Regression for Noisy Supervision