Abstract:Existing research on learning with noisy labels mainly focuses on synthetic label noise. Synthetic noise, though has clean structures which greatly enabled statistical analyses, often fails to model real-world noise patterns. The recent literature has observed several efforts to offer real-world noisy datasets, yet the existing efforts suffer from two caveats: (1) The lack of ground-truth verification makes it hard to theoretically study the property and treatment of real-world label noise; (2) These efforts are often of large scales, which may result in unfair comparisons of robust methods within reasonable and accessible computation power. To better understand real-world label noise, it is crucial to build controllable and moderate-sized real-world noisy datasets with both ground-truth and noisy labels. This work presents two new benchmark datasets CIFAR-10N, CIFAR-100N, equipping the training datasets of CIFAR-10, CIFAR-100 with human-annotated real-world noisy labels we collected from Amazon Mechanical Turk. We quantitatively and qualitatively show that real-world noisy labels follow an instance-dependent pattern rather than the classically assumed and adopted ones (e.g., class-dependent label noise). We then initiate an effort to benchmarking a subset of the existing solutions using CIFAR-10N and CIFAR-100N. We further proceed to study the memorization of correct and wrong predictions, which further illustrates the difference between human noise and class-dependent synthetic noise. We show indeed the real-world noise patterns impose new and outstanding challenges as compared to synthetic label noise. These observations require us to rethink the treatment of noisy labels, and we hope the availability of these two datasets would facilitate the development and evaluation of future learning with noisy label solutions. Datasets and leaderboards are available at <a class="link-external link-http" href="http://noisylabels.com" rel="external noopener nofollow">this http URL</a>.

An Empirical Study of the Noise Impact on Cost-Sensitive Learning

Cost-guided class noise handling for effective cost-sensitive learning

Class Noise Handling for Effective Cost-Sensitive Learning by Cost-Guided Iterative Classification Filtering

Noise is the Fatal Poison: A Noise-aware Network for Noisy Dataset Classification

Class Noise Vs. Attribute Noise: A Quantitative Study

The Influence of Class Imbalance on Cost-Sensitive Learning: an Empirical Study.

Mitigating the impact of mislabeled data on deep predictive models: an empirical study of learning with noise approaches in software engineering tasks

Cost Sensitive Semi-Supervised Laplacian Support Vector Machine

Towards cost-sensitive learning for real-world applications

Adaptive Cost-Sensitive Learning in Neural Networks for Misclassification Cost Problems

Privacy-Preserving Cost-Sensitive Learning

An adaptive cost-sensitive learning approach in neural networks to minimize local training–test class distributions mismatch

Which noise affects algorithm robustness for learning to rank

A Theoretical Analysis of Learning with Noisily Labeled Data

Training cost-sensitive neural networks with methods addressing the class imbalance problem

Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations

Label Noise: Ignorance Is Bliss

A Novel Class Noise Estimation Method and Application in Classification.

Improve Cost Efficiency of Active Learning over Noisy Dataset

Rethinking Cost-sensitive Classification in Deep Learning via Adversarial Data Augmentation

Noisy Label Processing for Classification: A Survey