Abstract:Confidence calibration is central to providing accurate and interpretable uncertainty estimates, especially under safety-critical scenarios. However, we find that existing calibration algorithms often overlook the issue of *proximity bias*, a phenomenon where models tend to be more overconfident in low proximity data (i.e., data lying in the sparse region of the data distribution) compared to high proximity samples, and thus suffer from inconsistent miscalibration across different proximity samples. We examine the problem over 504 pretrained ImageNet models and observe that: 1) Proximity bias exists across a wide variety of model architectures and sizes; 2) Transformer-based models are relatively more susceptible to proximity bias than CNN-based models; 3) Proximity bias persists even after performing popular calibration algorithms like temperature scaling; 4) Models tend to overfit more heavily on low proximity samples than on high proximity samples. Motivated by the empirical findings, we propose ProCal, a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity. To further quantify the effectiveness of calibration algorithms in mitigating proximity bias, we introduce proximity-informed expected calibration error (PIECE) with theoretical analysis. We show that ProCal is effective in addressing proximity bias and improving calibration on balanced, long-tail, and distribution-shift settings under four metrics over various model architectures. We believe our findings on proximity bias will guide the development of *fairer and better-calibrated* models, contributing to the broader pursuit of trustworthy AI. Our code is available at: <a class="link-external link-https" href="https://github.com/MiaoXiong2320/ProximityBias-Calibration" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the confidence calibration problem in deep neural network models, especially the bias problem related to sample proximity. Specifically, the paper points out that existing calibration algorithms often overlook proximity bias, that is, the model shows higher over - confidence on low - proximity data (i.e., data in sparse regions of the data distribution), while being relatively conservative on high - proximity samples. This bias can lead to inconsistent calibration between samples of different proximities, thus affecting the reliability and interpretability of the model. To understand more clearly the problems to be solved in the paper, we can summarize the following points: 1. **Existence of Proximity Bias**: By analyzing 504 pre - trained ImageNet models, the paper finds that proximity bias is widespread and is not limited to specific model architectures or sizes. 2. **Limitations of Existing Calibration Methods**: Commonly used calibration methods such as temperature scaling cannot effectively alleviate proximity bias, resulting in calibration errors still existing between samples of different proximities. 3. **Safety and Fairness Issues**: Proximity bias can cause safety and fairness issues in practical applications. Especially in high - risk scenarios such as medical diagnosis, misjudgment of minority groups or rare samples may lead to serious consequences. Based on these problems, the paper proposes a new calibration method - PROCAL (Proximity - Informed Calibration), which aims to adjust the model's confidence estimation by considering the sample's proximity, thereby improving the model's calibration performance and reliability. In addition, the paper also introduces a new evaluation metric - PIECE (Proximity - Informed Expected Calibration Error) to quantify the impact of proximity bias and prove its effectiveness. ### Main Contributions of the Paper: - **Discovering the Proximity Bias Problem**: Through large - scale experiments, it reveals the widespread existence of proximity bias and its impact on model calibration. - **Proposing a New Evaluation Metric**: Introducing PIECE to better quantify and evaluate the impact of proximity bias on calibration. - **Developing an Effective Calibration Method**: Proposing the PROCAL method, which can significantly improve the calibration effect of samples with different proximities. Through these contributions, the paper provides important theoretical and technical support for constructing more reliable, interpretable and fair deep - learning models.

Proximity-Informed Calibration for Deep Neural Networks

Field-aware Calibration: A Simple and Empirically Strong Method for Reliable Probabilistic Predictions

Two Sides of Miscalibration: Identifying Over and Under-Confidence Prediction for Network Calibration

Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence

Calibrating Deep Neural Network using Euclidean Distance

A Comparative Study of Confidence Calibration in Deep Learning: From Computer Vision to Medical Imaging

Beyond Calibration: Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence

Calibration in Deep Learning: A Survey of the State-of-the-Art

Confidence Calibration for Intent Detection Via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss

Deep learning model calibration for improving performance in class-imbalanced medical image classification tasks

Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors

Towards Unbiased Calibration using Meta-Regularization

Multivariate Confidence Calibration for Object Detection

Improved Trainable Calibration Method for Neural Networks on Medical Imaging Classification

Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

Improving Uncertainty Calibration of Deep Neural Networks via Truth Discovery and Geometric Optimization

The Calibration Generalization Gap

Revisiting the Calibration of Modern Neural Networks

Test Time Augmentation Meets Post-hoc Calibration: Uncertainty Quantification under Real-World Conditions

On the Calibration of Human Pose Estimation