Unified Uncertainty Calibration

Kamalika Chaudhuri,David Lopez-Paz
2024-01-19
Abstract:To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes.The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise.Unfortunately, this recipe does not allow different sources of uncertainty to communicate with each other, produces miscalibrated predictions, and it does not allow to correct for misspecifications in our uncertainty estimates. To address these three issues, we introduce \emph{unified uncertainty calibration (U2C)}, a holistic framework to combine aleatoric and epistemic uncertainties. U2C enables a clean learning-theoretical analysis of uncertainty estimation, and outperforms reject-or-classify across a variety of ImageNet benchmarks. Our code is available at: <a class="link-external link-https" href="https://github.com/facebookresearch/UnifiedUncertaintyCalibration" rel="external noopener nofollow">this https URL</a>
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to make AI systems more accurately identify the situation of "I don't know" in prediction tasks in the face of uncertainty, especially when test samples are difficult to classify or are outside the training data. Specifically, the paper focuses on how to combine **Aleatoric uncertainty** (related to the inherent randomness of data) and **Epistemic uncertainty** (related to the model's understanding of data) to improve the robustness, fairness, and safety of AI systems. ### Main problems of the paper 1. **Ineffective communication of different types of uncertainty**: Existing methods such as the **Reject - or - Classify (RC)** strategy usually decide whether to reject a prediction only based on Epistemic uncertainty, which may lead to easily - classifiable samples being wrongly rejected or abnormal samples being wrongly accepted. 2. **Uncalibrated prediction results**: The RC method tends to make absolute decisions (either completely reject or completely accept), which leads to inaccurate confidence in prediction results. 3. **Inability to correct the bias of uncertainty estimation**: The RC method lacks a mechanism to adjust the bias in Epistemic uncertainty estimation, thus affecting the performance of the model. ### Solutions To solve the above problems, the paper proposes the **Unified Uncertainty Calibration (U2C)** framework. The main contributions of U2C include: 1. **Combining Aleatoric and Epistemic uncertainty**: U2C combines the two types of uncertainty through a non - linear calibration function to generate an extended Softmax vector covering all categories (including an additional category representing "unknown"). 2. **Improving the calibration of prediction results**: The probability predictions generated by U2C are calibrated for all categories, which means that the model's confidence is more consistent with its actual performance. 3. **Allowing non - linear calibration of Epistemic uncertainty**: U2C learns a non - linear calibration function by optimizing the cross - entropy loss, which can correct the bias in Epistemic uncertainty estimation and thus improve the performance of the model. ### Specific methods 1. **Collecting the validation set**: Extract a new validation set from the training distribution to calculate the threshold of Epistemic uncertainty. 2. **Calculating the threshold**: Relabel the 5% of samples with the highest Epistemic uncertainty in the validation set as the "unknown" category. 3. **Learning the non - linear calibration function**: Learn a non - linear Epistemic calibration function by minimizing the cross - entropy loss on the relabeled validation set. 4. **Generating the extended Softmax vector**: Use the learned calibration function to generate an extended Softmax vector containing all categories. ### Theoretical analysis The paper proves through theoretical analysis that U2C is superior to the RC method in some cases. Especially when there are a large number of samples with high Aleatoric uncertainty but low Epistemic uncertainty in the test data, U2C performs better. In addition, U2C also shows advantages in metrics such as Negative Log - Likelihood (NLL) and Expected Calibration Error (ECE). ### Experimental results The paper conducts experiments on multiple ImageNet benchmark datasets, including in - domain data, covariate - shift data, approximate OOD data, and far - out OOD data. The experimental results show that U2C is superior to the RC method in metrics such as classification error rate and calibration error, especially when dealing with OOD data. ### Summary This paper solves the deficiencies of existing methods in dealing with uncertainty prediction by proposing the U2C framework, improving the robustness and reliability of AI systems. The U2C framework not only has theoretical advantages but also shows better performance in practical applications.