A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels

Emeson Santana,Gustavo Carneiro,Filipe R. Cordeiro

DOI: https://doi.org/10.1109/SIBGRAPI55357.2022.9991791

2023-08-07

Abstract:Label noise is common in large real-world datasets, and its presence harms the training process of deep neural networks. Although several works have focused on the training strategies to address this problem, there are few studies that evaluate the impact of data augmentation as a design choice for training deep neural networks. In this work, we analyse the model robustness when using different data augmentations and their improvement on the training with the presence of noisy labels. We evaluate state-of-the-art and classical data augmentation strategies with different levels of synthetic noise for the datasets MNist, CIFAR-10, CIFAR-100, and the real-world dataset Clothing1M. We evaluate the methods using the accuracy metric. Results show that the appropriate selection of data augmentation can drastically improve the model robustness to label noise, increasing up to 177.84% of relative best test accuracy compared to the baseline with no augmentation, and an increase of up to 6% in absolute value with the state-of-the-art DivideMix training strategy.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper aims to study the impact of data augmentation on the training of Convolutional Neural Networks (CNNs) in the presence of noisy labels. Specifically, the authors analyze how different data augmentation methods affect the robustness of the model and explore which data augmentation strategies or combinations can improve model performance under varying levels of noisy labels. Various classical data augmentation methods and the latest data augmentation techniques were evaluated through experiments and their performances were compared across multiple datasets (including MNIST, CIFAR-10, CIFAR-100, and Clothing1M). The research results indicate that appropriate augmentation choices can significantly enhance the robustness of the model under noisy labels, with the best test accuracy relatively increasing by up to 177.84% and the absolute value increasing by up to 6% compared to methods without augmentation. This highlights the importance of data augmentation as a design decision, especially when dealing with datasets containing noisy labels.

A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels

Data-Efficient Augmentation for Training Neural Networks

What Are Effective Labels for Augmented Data? Improving Calibration and Robustness with AutoLabel

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

Data Augmentation Can Improve Robustness

Improving Deep Learning using Generic Data Augmentation

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Enhancing Performance of Deep Learning Models with a Novel Data Augmentation Approach

ConfidentMix: Confidence-Guided Mixup for Learning With Noisy Labels

Hologram Noise Model for Data Augmentation and Deep Learning

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

On-the-fly Denoising for Data Augmentation in Natural Language Understanding

Towards Efficient Data-Centric Robust Machine Learning with Noise-based Augmentation

Exploring Data Augmentation Methods on Social Media Corpora

On the Generalization Effects of Linear Transformations in Data Augmentation

Interpretability-Mask: a Label-Preserving Data Augmentation Scheme for Better Classification

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

Noisy Label Processing for Classification: A Survey

Robust Classification by Coupling Data Mollification with Label Smoothing

Orthogonal Transform-Driven Data Augmentation for Limited Gaussian-Tainted Dataset

Is augmentation effective to improve prediction in imbalanced text datasets?