Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting

Mingkui Tan,Guohao Chen,Jiaxiang Wu,Yifan Zhang,Yaofo Chen,Peilin Zhao,Shuaicheng Niu
2024-03-18
Abstract:Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Although recent TTA has shown promising performance, we still face two key challenges: 1) prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications; 2) while existing TTA can significantly improve the test performance on out-of-distribution data, they often suffer from severe performance degradation on in-distribution data after TTA (known as forgetting). To this end, we have proposed an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples for test-time entropy minimization. To alleviate forgetting, EATA introduces a Fisher regularizer estimated from test samples to constrain important model parameters from drastic changes. However, in EATA, the adopted entropy loss consistently assigns higher confidence to predictions even for samples that are underlying uncertain, leading to overconfident predictions. To tackle this, we further propose EATA with Calibration (EATA-C) to separately exploit the reducible model uncertainty and the inherent data uncertainty for calibrated TTA. Specifically, we measure the model uncertainty by the divergence between predictions from the full network and its sub-networks, on which we propose a divergence loss to encourage consistent predictions instead of overconfident ones. To further recalibrate prediction confidence, we utilize the disagreement among predicted labels as an indicator of the data uncertainty, and then devise a min-max entropy regularizer to selectively increase and decrease prediction confidence for different samples. Experiments on image classification and semantic segmentation verify the effectiveness of our methods.
Machine Learning
What problem does this paper attempt to address?
The paper mainly addresses the issue of performance degradation in deep neural networks when the test data distribution is inconsistent with the training data distribution, and proposes two methods: Efficient Anti-forgetting Test-time Adaptation (EATA) and EATA with Calibration (EATA-C). 1. **EATA** Method: - **Objective**: To improve the efficiency of test-time adaptation (TTA) and address the issue of "catastrophic forgetting" caused by existing TTA strategies. - **Technical Means**: - **Sample-efficient Entropy Minimization**: By using an active sample selection strategy to reduce unnecessary backpropagation, the overall TTA efficiency is improved. Specifically, prediction entropy is used to identify reliable samples, and redundant samples are excluded to further enhance efficiency. - **Anti-forgetting Weight Regularization**: By introducing an importance-aware regularizer (based on the Fisher information matrix), it ensures that parameters important to the ID domain do not undergo drastic changes during TTA, thereby mitigating "catastrophic forgetting." 2. **EATA-C** Method: - **Objective**: To further address the issue of overconfident predictions, where the model gives high-confidence predictions even for uncertain data. - **Technical Means**: - **Model Uncertainty Reduction**: By measuring the discrepancy between the predictions of the full network and its sub-networks to estimate model uncertainty, and introducing a consistency loss to reduce this uncertainty, overconfident predictions are avoided. - **Prediction Uncertainty Recalibration**: Using the inconsistency between predicted labels as an indicator of data uncertainty, a min-max entropy regularizer is designed to selectively adjust the prediction confidence based on the inherent data uncertainty of each sample. In summary, the methods proposed in the paper aim to improve the efficiency, stability, and accuracy of test-time adaptation, especially in handling distribution shifts, effectively enhancing the model's generalization ability to unknown data while maintaining performance on the original data.