Test Time Augmentation Meets Post-hoc Calibration: Uncertainty Quantification under Real-World Conditions

Achim Hekler,Titus J. Brinker,Florian Buettner
DOI: https://doi.org/10.1609/aaai.v37i12.26735
2023-06-26
Proceedings of the AAAI Conference on Artificial Intelligence
Abstract:Communicating the predictive uncertainty of deep neural networks transparently and reliably is important in many safety-critical applications such as medicine. However, modern neural networks tend to be poorly calibrated, resulting in wrong predictions made with a high confidence. While existing post-hoc calibration methods like temperature scaling or isotonic regression yield strongly calibrated predictions in artificial experimental settings, their efficiency can significantly reduce in real-world applications, where scarcity of labeled data or domain drifts are commonly present. In this paper, we first investigate the impact of these characteristics on post-hoc calibration and introduce an easy-to-implement extension of common post-hoc calibration methods based on test time augmentation. In extensive experiments, we demonstrate that our approach results in substantially better calibration on various architectures. We demonstrate the robustness of our proposed approach on a real-world application for skin cancer classification and show that it facilitates safe decision-making under real-world uncertainties.
What problem does this paper attempt to address?