Deep Learning Classification With Noisy Labels

Guillaume Sanchez,Vincente Guis,Ricard Marxer,Frédéric Bouchara
DOI: https://doi.org/10.1109/ICMEW46912.2020.9105992
2020-04-23
Abstract:Deep Learning systems have shown tremendous accuracy in image classification, at the cost of big image datasets. Collecting such amounts of data can lead to labelling errors in the training set. Indexing multimedia content for retrieval, classification or recommendation can involve tagging or classification based on multiple criteria. In our case, we train face recognition systems for actors identification with a closed set of identities while being exposed to a significant number of perturbators (actors unknown to our database). Face classifiers are known to be sensitive to label noise. We review recent works on how to manage noisy annotations when training deep learning classifiers, independently from our interest in face recognition.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how deep - learning classifiers deal with data sets with noisy labels during the training process. Specifically, the author focuses on how to effectively train deep - learning models when there are a large number of mis - labeled samples in the data set to ensure that their performance and accuracy are not affected. ### Problem Background When constructing large - scale image data sets, due to the wide and diverse data sources, mis - labeling is inevitable. These noisy labels will have a negative impact on the model training, leading to a decline in model performance. Especially in tasks such as face recognition, the impact of noisy labels is particularly significant. Therefore, how to deal with these noisy labels has become an important research topic. ### Main Problems 1. **The impact of noisy labels on model performance**: Noisy labels can mislead the model's learning process, causing the model to over - fit the noise or fail to generalize correctly. 2. **How to detect and correct noisy labels**: Effective techniques need to be developed to identify and correct noisy labels in the data set. 3. **How to design robust learning algorithms**: It is still possible to train high - performance classifiers even in the presence of noisy labels. ### Solution Overview The paper reviews the research progress on the noisy label problem in recent years and proposes several main solutions: 1. **Prediction Reweighting**: Adjust the weights of model predictions by estimating the confusion matrix, thereby reflecting the uncertainty of each observed label. 2. **Sample Importance Reweighting**: Re - adjust the importance of samples in training according to the probability of samples being contaminated by noise. 3. **Unlabeling**: Treat samples considered to be noisy as unlabeled samples and use semi - supervised or unsupervised methods for training. 4. **Label Fixing**: Try to directly correct noisy labels so that the model can be trained on the correct labels. ### Experimental Verification To verify the effectiveness of these methods, the author uses multiple public data sets (such as CIFAR - 10, MNIST, Clothing1M, Food101 - N, and WebVision) and conducts experiments by introducing artificial noisy labels. The experimental results show that different methods perform differently under different types of noise, each having its own advantages and disadvantages. ### Conclusion This paper not only summarizes the existing noisy label processing methods but also points out the application scenarios and limitations of each method. Future research can further explore how to combine multiple methods to better deal with the complex and diverse noisy label problems.