Abstract:Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. This can be formalized via the optimization of a loss's risk with a $ \phi$-divergence regularization term. Through this idealized distribution, a rejection decision can be made by utilizing the density ratio between this distribution and the data distribution. We focus on the setting where our $ \phi $-divergences are specified by the family of $ \alpha $-divergence. Our framework is tested empirically over clean and noisy datasets.

What problem does this paper attempt to address?

The paper primarily aims to address a key issue encountered by machine learning models during prediction: how to allow the model to refuse to make a prediction when it is uncertain or the prediction might be inaccurate. This issue is particularly important in practical application scenarios such as autonomous driving, product quality inspection, and medical diagnosis, where incorrect predictions can lead to severe consequences. To solve the aforementioned problem, the paper proposes a new framework that enables the model's refusal prediction function by learning an idealized data distribution. Specifically, given a pre-trained model and its loss function, the researchers seek a data distribution that allows the model to perform optimally and compare it with the actual data input distribution to make a refusal prediction decision. This comparison is done by calculating the density ratio between the two distributions. The main contributions of the paper include: 1. Proposing a new learning refusal framework based on the density ratio of an idealized distribution, which is similar to the distributions discussed in Distributionally Robust Optimization (DRO) and Generalized Variational Inference (GVI). 2. Demonstrating that the refusal strategy learned under this framework can theoretically recover the optimal refusal strategy, such as the Chow rule. 3. Deriving the idealized distribution generated by α-divergence. 4. Providing a series of simplified assumptions to facilitate the practical application of this framework and validating the effectiveness of these assumptions through experiments. The paper also discusses how to implement these theoretical results in practice, including how to estimate the loss function, determine the normalization constant, and adjust the refusal threshold τ, among other practical issues. Finally, a series of experiments were conducted to evaluate the effectiveness of the proposed density ratio rejector, covering standard classification tasks as well as tasks with introduced label noise.

Rejection via Learning Density Ratios

Regression with Cost-based Rejection

When No-Rejection Learning is Consistent for Regression with Rejection

Classification Diffusion Models: Revitalizing Density Ratio Estimation

Naturally constrained reject option classification

Binary Losses for Density Ratio Estimation

Discriminative Density-ratio Estimation

Classification with Rejection Based on Cost-sensitive Classification

Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Double Ramp Loss Based Reject Option Classifier

$α$-Divergence Loss Function for Neural Density Ratio Estimation

A New Classifier for Imbalanced Data Based on a Generalized Density Ratio Model

On the Calibration of Multiclass Classification with Rejection

Unsupervised Anomaly Detection with Rejection

Partial-Label Learning with a Reject Option

Optimal strategies for reject option classifiers

A Density Ratio Super Learner

Reject inference methods in credit scoring

Regression with reject option and application to kNN

On Reject and Refine Options in Multicategory Classification

Distribution Learning with Valid Outputs Beyond the Worst-Case