Abstract:Being cautious is crucial for enhancing the trustworthiness of machine learning systems integrated into decision-making pipelines. Although calibrated probabilities help in optimal decision-making, perfect calibration remains unattainable, leading to estimates that fluctuate between under- and overconfidence. This becomes a critical issue in high-risk scenarios, where even occasional overestimation can lead to extreme expected costs. In these scenarios, it is important for each predicted probability to lean towards underconfidence, rather than just achieving an average balance. In this study, we introduce the novel concept of cautious calibration in binary classification. This approach aims to produce probability estimates that are intentionally underconfident for each predicted probability. We highlight the importance of this approach in a high-risk scenario and propose a theoretically grounded method for learning cautious calibration maps. Through experiments, we explore and compare our method to various approaches, including methods originally not devised for cautious calibration but applicable in this context. We show that our approach is the most consistent in providing cautious estimates. Our work establishes a strong baseline for further developments in this novel framework.

What problem does this paper attempt to address?

### The problems the paper attempts to solve The paper "Cautious Calibration in Binary Classification" attempts to solve the problem that probability calibration methods cannot achieve perfect calibration in binary classification tasks. Although existing calibration methods can improve the accuracy of decision - making, these methods still have the problems of over - confidence or under - confidence. Especially in high - risk scenarios, occasional over - confidence may lead to extremely high expected costs. Therefore, the paper proposes a new concept - Cautious Calibration, aiming to generate intentionally under - confident probability estimates to avoid over - confidence in any single prediction. ### Background and motivation 1. **The importance of calibration**: - Calibrating probabilities helps optimize the decision - making process. Especially in classification models, calibrated probabilities can help humans better understand the model's prediction results. - However, existing calibration methods (such as equidistant calibration, logistic calibration, beta calibration, etc.) cannot achieve perfect calibration, resulting in the predicted probability values fluctuating between over - confidence and under - confidence. 2. **Challenges in high - risk scenarios**: - In high - risk scenarios, even occasional over - confidence may lead to serious consequences. For example, in the field of autonomous driving, if the model is over - confident in believing that the road is clear, it may lead to a serious traffic accident. - Therefore, in these scenarios, the probability of each prediction should tend to be under - confident rather than simply pursuing an average - sense balance. ### Solutions 1. **The concept of cautious calibration**: - The author proposes the concept of "cautious calibration", with the goal of generating probability estimates that are always biased towards under - confidence, that is, providing the lower bound of the true calibration value. - Through this method, it can be ensured that over - confidence will not occur in any single prediction, thereby reducing potential risks. 2. **Theoretical basis**: - The author proposes a method based on reverse hypothesis testing to calculate the lower bound of probabilities. Specifically, they use statistical functions of monotonicity (such as sum functions) to calculate the lower bound and prove the applicability of these methods on heterogeneous Bernoulli vectors. - By selecting appropriate subsequences to calculate the lower bound, it can be ensured that these lower bounds are conservative and have probability guarantees. 3. **Experimental verification**: - The author compares their method with other existing methods (including methods that were not originally designed for cautious calibration but can be applied) through experiments. The results show that their method is the most consistent in providing cautious estimates. ### Example scenario The paper illustrates the necessity of cautious calibration through an example in the field of autonomous driving. In this scenario, the model needs to select the speed of the car (i.e., the risk level) according to the predicted probability. If the model is over - confident in believing that the road is clear, it may select a higher speed, which may lead to a serious accident when encountering an obstacle. Conversely, if the model tends to be under - confident, although it may select a lower speed, it can avoid serious consequences. ### Conclusion By introducing the concept of cautious calibration, this paper provides a new method to deal with the probability calibration problem in high - risk scenarios. The experimental results show that this method performs excellently in providing cautious estimates and provides a strong baseline for future research.

Cautious Calibration in Binary Classification

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

Probabilistic Scores of Classifiers, Calibration is not Enough

Classifier Calibration: A survey on how to assess and improve predicted class probabilities

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

Can I Trust You? Rethinking Calibration with Controllable Confidence Ranking

Risk-based Calibration for Probabilistic Classifiers

Calibrated Selective Classification

Two Sides of Miscalibration: Identifying Over and Under-Confidence Prediction for Network Calibration

Minimum-Risk Recalibration of Classifiers

Calibration methods in imbalanced binary classification

Verified Uncertainty Calibration

Human-Aligned Calibration for AI-Assisted Decision Making

Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence

Better Uncertainty Calibration via Proper Scores for Classification and Beyond

Classifier Calibration: with application to threat scores in cybersecurity

Calibration Error for Decision Making

On the Calibration of Probabilistic Classifier Sets

Optimizing Estimators of Squared Calibration Errors in Classification

Leveraging Uncertainty Estimates To Improve Classifier Performance