Abstract:Noisy labels significantly hinder the accuracy and generalization of machine learning models, particularly due to ambiguous instance features. Traditional techniques that attempt to correct noisy labels directly, such as those using transition matrices, often fail to address the inherent complexities of the problem sufficiently. In this paper, we introduce EchoAlign, a transformative paradigm shift in learning from noisy labels. Instead of focusing on label correction, EchoAlign treats noisy labels ($\tilde{Y}$) as accurate and modifies corresponding instance features ($X$) to achieve better alignment with $\tilde{Y}$. EchoAlign's core components are (1) EchoMod: Employing controllable generative models, EchoMod precisely modifies instances while maintaining their intrinsic characteristics and ensuring alignment with the noisy labels. (2) EchoSelect: Instance modification inevitably introduces distribution shifts between training and test sets. EchoSelect maintains a significant portion of clean original instances to mitigate these shifts. It leverages the distinct feature similarity distributions between original and modified instances as a robust tool for accurate sample selection. This integrated approach yields remarkable results. In environments with 30% instance-dependent noise, even at 99% selection accuracy, EchoSelect retains nearly twice the number of samples compared to the previous best method. Notably, on three datasets, EchoAlign surpasses previous state-of-the-art techniques with a substantial improvement.

What problem does this paper attempt to address?

The paper primarily addresses the issue of noisy labels in machine learning by proposing a new solution. Traditionally, methods for handling noisy labels either attempt to directly correct these erroneous labels or indirectly address the problem by modeling the noise process. However, these methods often fail to adequately address the complexity brought by noisy labels, especially in the case of instance-dependent noise. The paper proposes a new framework called EchoAlign, which changes the way noisy labels are handled. The core idea of EchoAlign is not to correct the noisy labels but to treat them as accurate and adjust the corresponding instance features to better align with these noisy labels. This approach is based on the assumption that even if the labels are incorrect, they still contain some information about the true labels. By adjusting the instance features to better match this information, the model's performance can be improved. EchoAlign consists of two key components: 1. **EchoMod**: Uses a controllable generative model to precisely modify instances to align with the noisy labels while trying to keep the original characteristics of the instances unchanged. This helps ensure the overall quality and consistency of the training data. 2. **EchoSelect**: To address the distribution shift between the training set and the test set that may result from instance modification, EchoSelect retains a portion of the originally correct instances to balance the distribution of original and modified instances in the training data. It utilizes the different similarity distributions of instance features before and after modification for sample selection. The paper also provides theoretical analysis demonstrating the effectiveness of this instance modification method and shows through experiments that EchoAlign has significant advantages over existing techniques in various types of noisy environments. Particularly in cases of high instance-dependent noise, EchoAlign performs excellently, retaining more samples while maintaining high selection accuracy, thereby improving the model's accuracy and generalization ability.

Can We Treat Noisy Labels as Accurate?

Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Learning from Noisy Labels with Decoupled Meta Label Purifier

Reliable Label Correction is a Good Booster When Learning with Extremely Noisy Labels.

NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification

Improving Speaker Verification with Noise-Aware Label Ensembling and Sample Selection: Learning and Correcting Noisy Speaker Labels

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

Error-Bounded Correction of Noisy Labels

Dynamic training for handling textual label noise

Learning to Detect Noisy Labels Using Model-Based Features

Improving deep label noise learning with dual active label correction

Which Strategies Matter for Noisy Label Classification? Insight into Loss and Uncertainty

Understanding Instance-Level Label Noise: Disparate Impacts and Treatments

Learning with Feature-Dependent Label Noise: A Progressive Approach

Holistic Label Correction for Noisy Multi-Label Classification

One-step Noisy Label Mitigation

Hierarchical Noise-Tolerant Meta-Learning With Noisy Labels

Which is More Effective in Label Noise Cleaning, Correction or Filtering?

Two Wrongs Don't Make a Right: Combating Confirmation Bias in Learning with Label Noise.