A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels

Emeson Santana,Gustavo Carneiro,Filipe R. Cordeiro
DOI: https://doi.org/10.1109/SIBGRAPI55357.2022.9991791
2023-08-07
Abstract:Label noise is common in large real-world datasets, and its presence harms the training process of deep neural networks. Although several works have focused on the training strategies to address this problem, there are few studies that evaluate the impact of data augmentation as a design choice for training deep neural networks. In this work, we analyse the model robustness when using different data augmentations and their improvement on the training with the presence of noisy labels. We evaluate state-of-the-art and classical data augmentation strategies with different levels of synthetic noise for the datasets MNist, CIFAR-10, CIFAR-100, and the real-world dataset Clothing1M. We evaluate the methods using the accuracy metric. Results show that the appropriate selection of data augmentation can drastically improve the model robustness to label noise, increasing up to 177.84% of relative best test accuracy compared to the baseline with no augmentation, and an increase of up to 6% in absolute value with the state-of-the-art DivideMix training strategy.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to study the impact of data augmentation on the training of Convolutional Neural Networks (CNNs) in the presence of noisy labels. Specifically, the authors analyze how different data augmentation methods affect the robustness of the model and explore which data augmentation strategies or combinations can improve model performance under varying levels of noisy labels. Various classical data augmentation methods and the latest data augmentation techniques were evaluated through experiments and their performances were compared across multiple datasets (including MNIST, CIFAR-10, CIFAR-100, and Clothing1M). The research results indicate that appropriate augmentation choices can significantly enhance the robustness of the model under noisy labels, with the best test accuracy relatively increasing by up to 177.84% and the absolute value increasing by up to 6% compared to methods without augmentation. This highlights the importance of data augmentation as a design decision, especially when dealing with datasets containing noisy labels.