Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network

Shuo Yang,Erkun Yang,Bo Han,Yang Liu,Min Xu,Gang Niu,Tongliang Liu
DOI: https://doi.org/10.48550/arXiv.2105.13001
2022-07-14
Abstract:In label-noise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., clean-label transition matrix (CLTM)) has been widely exploited to learn a clean label classifier by employing the noisy data. Motivated by that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayes-label transition matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. Note that given only noisy data, it is ill-posed to estimate either the CLTM or the BLTM. But favorably, Bayes optimal labels have less uncertainty compared with the clean labels, i.e., the class posteriors of Bayes optimal labels are one-hot vectors while those of clean labels are not. This enables two advantages to estimate the BLTM, i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, we estimate the BLTM parametrically by employing a deep neural network, leading to better generalization and superior classification performance.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to estimate the transition matrix from Bayesian optimal labels to noisy labels (i.e., the Bayesian label transition matrix, BLTM) when performing classification tasks in the presence of label noise, in order to construct a statistically consistent classifier. Traditional methods mainly focus on the transition matrix from clean labels to noisy labels (i.e., the clean label transition matrix, CLTM), but these methods are difficult to effectively estimate the transition matrix when only noisy data is available. This paper proposes to directly model the transition matrix from Bayesian optimal labels to noisy labels and use deep neural networks to parameterize and estimate this matrix, thereby improving classification performance. ### Main Contributions 1. **Research on Bayesian Label Transition Matrix**: Compared with the traditional clean label transition matrix, this paper proposes to study the transition probabilities between Bayesian optimal labels and noisy labels, that is, the Bayesian label transition matrix (BLTM). Since Bayesian optimal labels are deterministic and accessible, this transition matrix is easier to be parameterized and learned. 2. **Utilize Deep Neural Networks**: This paper is the first to propose using deep neural networks to capture noise patterns and generate a transition matrix for each input instance, thereby achieving parameterized estimation of instance - dependent label transition matrices. 3. **Experimental Verification**: The experimental results on three synthetic noise datasets and one large - scale real - world noise dataset show that the proposed method significantly improves classification performance under various experimental settings. ### Method Overview 1. **Bayesian Label Transition Matrix**: Define the transition matrix \(T^*(X)\) from Bayesian optimal labels to noisy labels, where \(T^*_{i,j}(X) = P(\tilde{Y} = j|Y^* = i, X)\) represents the probability that the Bayesian optimal label \(i\) is converted to the noisy label \(j\) under the input \(X\). 2. **Collect Bayesian Optimal Labels**: Use the noisy data distillation method to automatically infer a batch of Bayesian optimal labels with theoretical guarantees from the noisy dataset. 3. **Train Bayesian Label Transition Network**: Use the collected Bayesian optimal labels and noisy labels to train a parameterized Bayesian label transition network to estimate the instance - dependent Bayesian label transition matrix. 4. **Forward - Correction Training Classifier**: Combine the trained Bayesian label transition network and train the classifier through the forward - correction method to minimize the empirical risk. ### Experimental Results The experimental results on multiple datasets show that the proposed method has achieved significant performance improvements on both synthetic noise datasets and real - world noise datasets. Specifically, it is manifested in higher classification accuracy and better generalization ability. ### Conclusion By introducing the Bayesian label transition matrix and using deep neural networks for parameterized estimation, this paper successfully solves the problem of constructing a statistically consistent classifier in the label - noise environment, providing a new and effective method for processing large - scale noisy datasets.