SemiNLL: A Framework of Noisy-Label Learning by Semi-Supervised Learning

Zhuowei Wang,Jing Jiang,Bo Han,Lei Feng,Bo An,Gang Niu,Guodong Long
DOI: https://doi.org/10.48550/arXiv.2012.00925
2020-12-02
Abstract:Deep learning with noisy labels is a challenging task. Recent prominent methods that build on a specific sample selection (SS) strategy and a specific semi-supervised learning (SSL) model achieved state-of-the-art performance. Intuitively, better performance could be achieved if stronger SS strategies and SSL models are employed. Following this intuition, one might easily derive various effective noisy-label learning methods using different combinations of SS strategies and SSL models, which is, however, reinventing the wheel in essence. To prevent this problem, we propose SemiNLL, a versatile framework that combines SS strategies and SSL models in an end-to-end manner. Our framework can absorb various SS strategies and SSL backbones, utilizing their power to achieve promising performance. We also instantiate our framework with different combinations, which set the new state of the art on benchmark-simulated and real-world datasets with noisy labels.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is conducting deep - learning training in the presence of label noise. Specifically, the paper focuses on how to use semi - supervised learning (SSL) methods to mitigate the negative impact of label noise on deep neural network (DNN) training. Label noise means that the labels of some samples in the training dataset are wrong or inaccurate, which is very common in the process of constructing large - scale datasets, especially when collecting data through online search engines or crowdsourcing. Label noise can seriously affect the performance of the model. Therefore, how to effectively handle label noise has become an important research topic. The paper proposes a new framework - SemiNLL, aiming to combine sample selection (SS) strategies and semi - supervised learning models to make more efficient use of all samples, including those with potentially inaccurate labels. The core idea of the SemiNLL framework is to transform the label noise problem into a semi - supervised learning problem, that is, identify "clean" samples through sample selection strategies and use them as labeled data, while regarding other samples as unlabeled data, and then apply semi - supervised learning techniques to train the model. The advantage of this is that it can fully utilize the information of all samples while reducing the negative impact of label noise. The SemiNLL framework is highly flexible and can incorporate various sample selection strategies and semi - supervised learning models to achieve better performance. The paper also proposes two specific instantiation methods: DivideMix+ and GPL, which are based on different sample selection strategies and semi - supervised learning models respectively, and demonstrate superior performance on multiple benchmark datasets.