Can We Treat Noisy Labels as Accurate?

Yuxiang Zheng,Zhongyi Han,Yilong Yin,Xin Gao,Tongliang Liu
2024-05-22
Abstract:Noisy labels significantly hinder the accuracy and generalization of machine learning models, particularly due to ambiguous instance features. Traditional techniques that attempt to correct noisy labels directly, such as those using transition matrices, often fail to address the inherent complexities of the problem sufficiently. In this paper, we introduce EchoAlign, a transformative paradigm shift in learning from noisy labels. Instead of focusing on label correction, EchoAlign treats noisy labels ($\tilde{Y}$) as accurate and modifies corresponding instance features ($X$) to achieve better alignment with $\tilde{Y}$. EchoAlign's core components are (1) EchoMod: Employing controllable generative models, EchoMod precisely modifies instances while maintaining their intrinsic characteristics and ensuring alignment with the noisy labels. (2) EchoSelect: Instance modification inevitably introduces distribution shifts between training and test sets. EchoSelect maintains a significant portion of clean original instances to mitigate these shifts. It leverages the distinct feature similarity distributions between original and modified instances as a robust tool for accurate sample selection. This integrated approach yields remarkable results. In environments with 30% instance-dependent noise, even at 99% selection accuracy, EchoSelect retains nearly twice the number of samples compared to the previous best method. Notably, on three datasets, EchoAlign surpasses previous state-of-the-art techniques with a substantial improvement.
Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses the issue of noisy labels in machine learning by proposing a new solution. Traditionally, methods for handling noisy labels either attempt to directly correct these erroneous labels or indirectly address the problem by modeling the noise process. However, these methods often fail to adequately address the complexity brought by noisy labels, especially in the case of instance-dependent noise. The paper proposes a new framework called EchoAlign, which changes the way noisy labels are handled. The core idea of EchoAlign is not to correct the noisy labels but to treat them as accurate and adjust the corresponding instance features to better align with these noisy labels. This approach is based on the assumption that even if the labels are incorrect, they still contain some information about the true labels. By adjusting the instance features to better match this information, the model's performance can be improved. EchoAlign consists of two key components: 1. **EchoMod**: Uses a controllable generative model to precisely modify instances to align with the noisy labels while trying to keep the original characteristics of the instances unchanged. This helps ensure the overall quality and consistency of the training data. 2. **EchoSelect**: To address the distribution shift between the training set and the test set that may result from instance modification, EchoSelect retains a portion of the originally correct instances to balance the distribution of original and modified instances in the training data. It utilizes the different similarity distributions of instance features before and after modification for sample selection. The paper also provides theoretical analysis demonstrating the effectiveness of this instance modification method and shows through experiments that EchoAlign has significant advantages over existing techniques in various types of noisy environments. Particularly in cases of high instance-dependent noise, EchoAlign performs excellently, retaining more samples while maintaining high selection accuracy, thereby improving the model's accuracy and generalization ability.