Hypothesis Testing for Class-Conditional Noise Using Local Maximum Likelihood

Weisong Yang,Rafael Poyiadzi,Niall Twomey,Raul Santos Rodriguez
2023-12-16
Abstract:In supervised learning, automatically assessing the quality of the labels before any learning takes place remains an open research question. In certain particular cases, hypothesis testing procedures have been proposed to assess whether a given instance-label dataset is contaminated with class-conditional label noise, as opposed to uniform label noise. The existing theory builds on the asymptotic properties of the Maximum Likelihood Estimate for parametric logistic regression. However, the parametric assumptions on top of which these approaches are constructed are often too strong and unrealistic in practice. To alleviate this problem, in this paper we propose an alternative path by showing how similar procedures can be followed when the underlying model is a product of Local Maximum Likelihood Estimation that leads to more flexible nonparametric logistic regression models, which in turn are less susceptible to model misspecification. This different view allows for wider applicability of the tests by offering users access to a richer model class. Similarly to existing works, we assume we have access to anchor points which are provided by the users. We introduce the necessary ingredients for the adaptation of the hypothesis tests to the case of nonparametric logistic regression and empirically compare against the parametric approach presenting both synthetic and real-world case studies and discussing the advantages and limitations of the proposed approach.
Machine Learning
What problem does this paper attempt to address?
The main problem addressed in this paper is the method for evaluating label quality in supervised learning. Specifically, the researchers propose a non-parametric hypothesis testing method based on Local Maximum Likelihood Estimation (LMLE) to detect the presence of Class-Conditional Noise (CCN) in the dataset, as opposed to Uniform Noise (UN). Compared to existing parametric hypothesis testing methods, this approach is more flexible and less susceptible to model misspecification. The core contributions of the paper include: 1. **Extension of Hypothesis Testing Methods**: A non-parametric regression model based on Local Maximum Likelihood Estimation is proposed to detect class-conditional noise. 2. **Comparison of Two Methods**: A detailed comparison of the advantages and limitations of parametric and non-parametric methods is provided, along with guidance. 3. **Empirical Analysis**: The effectiveness of the proposed method is validated through synthetic and real datasets, and considerations for practical applications are discussed. Through this work, the paper aims to provide machine learning practitioners with a more robust tool for evaluating the quality of dataset labels.