Rejection via Learning Density Ratios

Alexander Soen,Hisham Husain,Philip Schulz,Vu Nguyen
2024-05-29
Abstract:Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. This can be formalized via the optimization of a loss's risk with a $ \phi$-divergence regularization term. Through this idealized distribution, a rejection decision can be made by utilizing the density ratio between this distribution and the data distribution. We focus on the setting where our $ \phi $-divergences are specified by the family of $ \alpha $-divergence. Our framework is tested empirically over clean and noisy datasets.
Machine Learning
What problem does this paper attempt to address?
The paper primarily aims to address a key issue encountered by machine learning models during prediction: how to allow the model to refuse to make a prediction when it is uncertain or the prediction might be inaccurate. This issue is particularly important in practical application scenarios such as autonomous driving, product quality inspection, and medical diagnosis, where incorrect predictions can lead to severe consequences. To solve the aforementioned problem, the paper proposes a new framework that enables the model's refusal prediction function by learning an idealized data distribution. Specifically, given a pre-trained model and its loss function, the researchers seek a data distribution that allows the model to perform optimally and compare it with the actual data input distribution to make a refusal prediction decision. This comparison is done by calculating the density ratio between the two distributions. The main contributions of the paper include: 1. Proposing a new learning refusal framework based on the density ratio of an idealized distribution, which is similar to the distributions discussed in Distributionally Robust Optimization (DRO) and Generalized Variational Inference (GVI). 2. Demonstrating that the refusal strategy learned under this framework can theoretically recover the optimal refusal strategy, such as the Chow rule. 3. Deriving the idealized distribution generated by α-divergence. 4. Providing a series of simplified assumptions to facilitate the practical application of this framework and validating the effectiveness of these assumptions through experiments. The paper also discusses how to implement these theoretical results in practice, including how to estimate the loss function, determine the normalization constant, and adjust the refusal threshold τ, among other practical issues. Finally, a series of experiments were conducted to evaluate the effectiveness of the proposed density ratio rejector, covering standard classification tasks as well as tasks with introduced label noise.