Improving deep label noise learning with dual active label correction

Shao-Yuan Li,Ye Shi,Sheng-Jun Huang,Songcan Chen
DOI: https://doi.org/10.1007/s10994-021-06081-9
IF: 5.414
2022-01-06
Machine Learning
Abstract:Label noise is now a common problem in many applications, which may lead to significant learning performance degeneration. To deal with the label noise, Active Label Correction (ALC) was proposed to query the true labels for a small subset of instances. As the true labels costs can be high, the focus of ALC is to maximally improve the learning performance with minimal query costs. Existing ALC methods mainly proceed by querying the most likely mislabeled instances, or using criteria derived from standard active learning. In this paper, we focus on deep neural network models and show that due to their intrinsic memorization effect, the true labels of a large proportion of mislabeled instances can be correctly predicted with early stopped training, even under severe noise. Inspired by this, we propose to train deep label noise learning models robustly with dual ALC (DALC): on one hand, we select the most useful instances for classifier improvement and query their true labels from external experts; on the other hand, due to the active data sampling bias, the label noise model estimation can be highly biased, which may in turn hurt the classifier learning. To alleviate this issue, we propose to identify the instances that are most likely predicted with true labels by the classifier, and take the predictions as their true labels. By integrating the two sources of true labels, we experiment on multiple benchmark datasets with various label noise rate and show the effectiveness of the proposed DALC on both the classification accuracy and the label noise model estimation. The code is available at https://github.com/lilylisy/mlj21DALC.
computer science, artificial intelligence
What problem does this paper attempt to address?