Abstract:Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in $k$, or needing to solve computationally intractable problems, or give rather weak guarantees. Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in $k$. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting.

Calibration methods in imbalanced binary classification

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

Classifier Calibration: A survey on how to assess and improve predicted class probabilities

Confidence Calibration of Classifiers with Many Classes

Cautious Calibration in Binary Classification

On the Calibration of Multiclass Classification with Rejection

Classifier Calibration: with application to threat scores in cybersecurity

Deep learning model calibration for improving performance in class-imbalanced medical image classification tasks

Probabilistic Scores of Classifiers, Calibration is not Enough

Top-label calibration and multiclass-to-binary reductions

Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification

Calibration of Machine Learning Classifiers for Probability of Default Modelling

Towards Fair and Calibrated Models

On Computationally Efficient Multi-Class Calibration

Two Sides of Miscalibration: Identifying Over and Under-Confidence Prediction for Network Calibration

Binary Classification: Counterbalancing Class Imbalance by Applying Regression Models in Combination with One-Sided Label Shifts

A measure oriented training scheme for imbalanced classification problems

TCE: A Test-Based Approach to Measuring Calibration Error

Optimizing Calibration by Gaining Aware of Prediction Correctness

Empirical analysis of performance assessment for imbalanced classification

Reassessing How to Compare and Improve the Calibration of Machine Learning Models