Robust Testing for Deep Learning using Human Label Noise

Gordon Lim,Stefan Larson,Kevin Leach
2024-11-30
Abstract:In deep learning (DL) systems, label noise in training datasets often degrades model performance, as models may learn incorrect patterns from mislabeled data. The area of Learning with Noisy Labels (LNL) has introduced methods to effectively train DL models in the presence of noisily-labeled datasets. Traditionally, these methods are tested using synthetic label noise, where ground truth labels are randomly (and automatically) flipped. However, recent findings highlight that models perform substantially worse under human label noise than synthetic label noise, indicating a need for more realistic test scenarios that reflect noise introduced due to imperfect human labeling. This underscores the need for generating realistic noisy labels that simulate human label noise, enabling rigorous testing of deep neural networks without the need to collect new human-labeled datasets. To address this gap, we present Cluster-Based Noise (CBN), a method for generating feature-dependent noise that simulates human-like label noise. Using insights from our case study of label memorization in the CIFAR-10N dataset, we design CBN to create more realistic tests for evaluating LNL methods. Our experiments demonstrate that current LNL methods perform worse when tested using CBN, highlighting its use as a rigorous approach to testing neural networks. Next, we propose Soft Neighbor Label Sampling (SNLS), a method designed to handle CBN, demonstrating its improvement over existing techniques in tackling this more challenging type of noise.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of model performance degradation in deep learning (DL) systems due to human label noise. Specifically, the paper focuses on the fact that when there is label noise in the training dataset, deep neural networks (NN) may learn wrong patterns, thus affecting the generalization ability of the model. Most of the existing research uses synthetic label noise to test and improve deep learning models, that is, by randomly flipping the real labels to simulate noise. However, recent research shows that the performance of models when dealing with human - labeled noise is significantly worse than when dealing with synthetic noise. This indicates that the existing methods fail to fully reflect the complex noise characteristics introduced by human labeling in the real world, so more realistic test scenarios are needed to evaluate the robustness of deep learning models in the face of human - labeled noise. To solve this problem, the paper makes the following two main contributions: 1. **Cluster - Based Noise (CBN)**: - A cluster - based noise generation method (Cluster - Based Noise, CBN) is proposed to simulate human - labeled noise. CBN generates feature - dependent noise by selecting random centroids in the CLIP feature space and flipping the labels within a specific radius. This method can more realistically simulate human - labeled noise, making the model face a more challenging noise environment during the training process. 2. **Soft Neighbor Label Sampling (SNLS)**: - A new soft label sampling method (Soft Neighbor Label Sampling, SNLS) is proposed to deal with the noise generated by CBN. SNLS constructs a soft label distribution by using the neighbor information of images in the CLIP feature space, helping the model maintain uncertainty about wrong labels during the learning process, thereby improving its performance in a noisy environment. Overall, the goal of this paper is to promote the research progress in the field of deep learning in dealing with label noise by introducing a noise model (CBN) closer to the actual application scenario and the corresponding solution (SNLS), and to improve the robustness and generalization ability of the model in the face of complex, real - world noise. ### Key formulas and concepts - **Label Memorization**: The definition of label memorization is as follows: \[ \text{mem}(A, S, i):=\Pr_{h\sim A(S)}[h(x_i) = y_i]-\Pr_{h\sim A(S\setminus i)}[h(x_i) = y_i] \] where: - \( S \) is the training dataset; - \( S\setminus i \) is the dataset after removing the \( i \) - th sample; - \( h \) is the model learned through algorithm \( A \); - The first probability term is called the inclusion probability, which represents the probability that the model correctly predicts the label \( y_i \) when \((x_i, y_i)\) is included in the dataset; - The second probability term is called the exclusion probability, which represents the probability that the model can still correctly predict the label \( y_i \) when \((x_i, y_i)\) is removed. - **Cluster - Based Noise (CBN) algorithm**: The main steps of the CBN algorithm are as follows: 1. Initialization: Given the dataset \( D \), the t - SNE - transformed CLIP feature embedding \( C \), the set of unique labels \( Y \), the number of sub - cluster centroids \( n \), and the radius \( r \) for label flipping. 2. For each label category \( y\in Y \), initialize the centroid as the mean of the feature embeddings of all samples in this category \( u_y \). 3.