ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure

Hee Suk Yoon,Joshua Tian Jin Tee,Eunseop Yoon,Sunjae Yoon,Gwangsu Kim,Yingzhen Li,Chang D. Yoo
2024-01-18
Abstract:Studies have shown that modern neural networks tend to be poorly calibrated due to over-confident predictions. Traditionally, post-processing methods have been used to calibrate the model after training. In recent years, various trainable calibration measures have been proposed to incorporate them directly into the training process. However, these methods all incorporate internal hyperparameters, and the performance of these calibration objectives relies on tuning these hyperparameters, incurring more computational costs as the size of neural networks and datasets become larger. As such, we present Expected Squared Difference (ESD), a tuning-free (i.e., hyperparameter-free) trainable calibration objective loss, where we view the calibration error from the perspective of the squared difference between the two expectations. With extensive experiments on several architectures (CNNs, Transformers) and datasets, we demonstrate that (1) incorporating ESD into the training improves model calibration in various batch size settings without the need for internal hyperparameter tuning, (2) ESD yields the best-calibrated results compared with previous approaches, and (3) ESD drastically improves the computational costs required for calibration during training due to the absence of internal hyperparameter. The code is publicly accessible at <a class="link-external link-https" href="https://github.com/hee-suk-yoon/ESD" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that modern neural networks tend to be over - confident in prediction, resulting in poor model calibration. Traditionally, calibration methods adjust the model through post - processing methods after training, such as temperature scaling and vector scaling. In recent years, some trainable calibration targets have been proposed, which can be directly incorporated into the training process. However, these methods all contain internal hyper - parameters, and their performance depends on the tuning of these hyper - parameters, increasing the computational cost, especially when the scale of neural networks and datasets becomes larger. For this reason, the paper proposes a trainable calibration target loss without parameter tuning (i.e., without hyper - parameters) - Expected Squared Difference (ESD). ESD views the calibration error from the perspective of the squared difference between two expectations. Through extensive experiments on multiple architectures (such as CNN and Transformer) and datasets, the paper shows the following points: 1. Incorporating ESD into the training process can improve the performance of the model in terms of calibration without the need for internal hyper - parameter tuning. 2. Compared with previous calibration methods, ESD can obtain the best calibration results. 3. Since there are no internal hyper - parameters to be tuned, ESD significantly reduces the computational cost required for calibration during training, especially when the model complexity and the size of the dataset increase. Overall, ESD provides an efficient and parameter - free method that can improve the calibration performance of the model during the training process, thereby reducing the consumption of computational resources and improving the reliability of the model in practical applications.