Abstract:Deep neural networks (DNNs) have advanced many machine learning tasks, but their performance is often harmed by noisy labels in real-world data. Addressing this, we introduce CoLafier, a novel approach that uses Local Intrinsic Dimensionality (LID) for learning with noisy labels. CoLafier consists of two subnets: LID-dis and LID-gen. LID-dis is a specialized classifier. Trained with our uniquely crafted scheme, LID-dis consumes both a sample's features and its label to predict the label - which allows it to produce an enhanced internal representation. We observe that LID scores computed from this representation effectively distinguish between correct and incorrect labels across various noise scenarios. In contrast to LID-dis, LID-gen, functioning as a regular classifier, operates solely on the sample's features. During training, CoLafier utilizes two augmented views per instance to feed both subnets. CoLafier considers the LID scores from the two views as produced by LID-dis to assign weights in an adapted loss function for both subnets. Concurrently, LID-gen, serving as classifier, suggests pseudo-labels. LID-dis then processes these pseudo-labels along with two views to derive LID scores. Finally, these LID scores along with the differences in predictions from the two subnets guide the label update decisions. This dual-view and dual-subnet approach enhances the overall reliability of the framework. Upon completion of the training, we deploy the LID-gen subnet of CoLafier as the final classification model. CoLafier demonstrates improved prediction accuracy, surpassing existing methods, particularly under severe label noise. For more details, see the code at

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to perform accurate classification tasks in the presence of noisy labels. Specifically, deep neural networks (DNNs) often encounter noisy labels in real - world data, and these noisy labels can impair the performance and generalization ability of the model. Therefore, this paper proposes a new method - CoLafier, which aims to deal with the learning problem with noisy labels by using Local Intrinsic Dimensionality (LID). ### Problem Definition Given a training set $\tilde{D}=\{(x_i,\tilde{y}_i)\}_{i = 1}^N$ containing noisy labels, where each $\tilde{y}_i$ is a one - hot vector representing the noisy label of instance $x_i$. The goal is to train a robust classification model $f(x;\Theta)\to\hat{y}$ that can accurately predict the true label $y_i$ of the instance without prior knowledge of the quality or correctness of the label. ### Challenges 1. **Lack of knowledge about noise proportion and pattern**: Without knowing the proportion and pattern of noisy labels in the dataset, it is difficult to develop a general method to collect enough clean labels to train a powerful model. 2. **Accumulation of errors during the training process**: Early selection or correction of errors may accumulate, leading to larger errors and making the model deviate from the expected results. ### Proposed Method To solve these problems, this paper introduces the CoLafier framework, which uses LID scores to distinguish between correctly and incorrectly labeled samples and enhances the learning process in the following ways: 1. **LID - dis sub - network**: This is a specialized classifier that processes not only the features of the sample but also its label. Through the training scheme, LID - dis can generate enhanced internal representations, thereby effectively distinguishing between correct and incorrect labels. 2. **LID - gen sub - network**: This is a regular classification model that operates only based on the features of the sample. During the training process, CoLafier uses two enhanced views to be input into the two sub - networks respectively. LID - dis assigns weights to each instance based on the LID scores generated from these two views and guides the label update decision. 3. **Dual - view and dual - sub - network method**: This method enhances the reliability of the entire framework. Especially in the case of severe label noise, CoLafier shows better prediction accuracy than existing methods. ### Main Contributions 1. **Innovative use of LID scores**: Developed the LID - dis sub - network, which can process the features and labels of samples simultaneously, generate enhanced representations, and effectively distinguish between correct and incorrect labels under different noise conditions. 2. **Introduction of the CoLafier framework**: This framework combines the two sub - networks of LID - dis and LID - gen, uses the LID scores in the two enhanced views to weight the loss function, and guides the label update decision according to the LID scores and prediction differences. 3. **Experimental proof of effectiveness**: Even without explicit noise feature information, CoLafier can still show better performance than existing methods under various noise conditions. Through these methods, CoLafier shows significant advantages in dealing with noisy labels, especially in the case of severe label noise.

CoLafier: Collaborative Noisy Label Purifier With Local Intrinsic Dimensionality Guidance

FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling

A Label Noise Robust Stacked Auto-Encoder Algorithm for Inaccurate Supervised Classification Problems

Co-LDL: A Co-Training-Based Label Distribution Learning Method for Tackling Label Noise

Learning from Noisy Labels with Decoupled Meta Label Purifier

CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition.

An Edge-Assisted Federated Contrastive Learning Method with Local Intrinsic Dimensionality in Noisy Label Environment

Leveraging Noisy Labels of Nearest Neighbors for Label Correction and Sample Selection

Invariant Feature Based Label Correction for DNN when Learning with Noisy Labels

ACE: A Coarse-to-Fine Learning Framework for Reliable Representation Learning Against Label Noise

Learning With Noisy Labels Over Imbalanced Subpopulations

Fine-Grained Classification with Noisy Labels

Collaborative Learning with Corrupted Labels.

Learning from Noisy Labels with Coarse-to-Fine Sample Credibility Modeling

Decoding class dynamics in learning with noisy labels

Robust Testing for Deep Learning using Human Label Noise

Improving deep label noise learning with dual active label correction

Adaptive Integration of Partial Label Learning and Negative Learning for Enhanced Noisy Label Learning

FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels

Collaborative Label Correction Via Entropy Thresholding

Co-learning: Learning from Noisy Labels with Self-supervision