Label-noise Learning Via Mixture Proportion Estimation

Qinghua ZHENG,Shuzhi CAO,Jianfei RUAN,Rui ZHAO,Bo DONG
DOI: https://doi.org/10.1360/ssi-2023-0126
2023-01-01
Scientia Sinica Informationis
Abstract:With the rise of artificial intelligence in recent years,along with the improvement of hardware computing power,deep learning has emerged as the new paradigm for artificial intelligence algorithms.In realistic multi-class classification scenarios,deep learning relies heavily on the availability of massive manually labeled data;the limitations of labeling costs and privacy protections,however,often make it difficult to obtain adequate amounts of appropriately labeled data for deep learning.Recently,crowdsourcing and web crawling have provided an easy way to collect large amounts of labeled data,but they are limited by the inevitable introduction of label noise.As deep neural networks have a high capacity to fit noisy labels,it is challenging to train deep networks robustly with noisy labels.For robust learning,existing works commonly rely explicitly or implicitly on a given set of anchor points,i.e.,instances that almost certainly belong to the true classes.Unfortunately,anchor points are difficult to obtain in practice,which makes these works fragile.To address this problem,in this paper,we build an anchor-free statistically consistent algorithm in the presence of label noise by creatively transforming the multi-class label-noise learning problem into a mixture proportion estimation(MPE)problem.This paper makes the following contributions:(i)we for the first time generalize the existing Regrouping-MPE(R-MPE)method that is only suitable for two-component scenarios,and propose a multi-component oriented R-MPE(MR-MPE)method without relying on the common irreducible assumption;and(ii)from a theoretical perspective,we demonstrate that the anchor point hypothesis for label-noise learning is equivalent to the irreducible hypothesis for MPE problems in the context of multi-class classification.Therefore,an anchor-free statistically consistent label-noise learning algorithm is subsequently constructed based on the proposed MR-MPE method.In this paper,comparative experiments with existing algorithms are conducted on both synthetic noisy datasets and real-world noisy datasets.The results demonstrate that the proposed algorithm performs most effectively on multiple datasets.Additionally,the robustness of the proposed algorithm is verified when anchor points are removed.
What problem does this paper attempt to address?