Abstract:Since distribution shifts are likely to occur during test-time and can drastically decrease the model's performance, online test-time adaptation (TTA) continues to update the model after deployment, leveraging the current test data. Clearly, a method proposed for online TTA has to perform well for all kinds of environmental conditions. By introducing the variable factors domain non-stationarity and temporal correlation, we first unfold all practically relevant settings and define the entity as universal TTA. We want to highlight that this is the first work that covers such a broad spectrum, which is indispensable for the use in practice. To tackle the problem of universal TTA, we identify and highlight several challenges a self-training based method has to deal with: 1) model bias and the occurrence of trivial solutions when performing entropy minimization on varying sequence lengths with and without multiple domain shifts, 2) loss of generalization which exacerbates the adaptation to multiple domain shifts and the occurrence of catastrophic forgetting, and 3) performance degradation due to shifts in class prior. To prevent the model from becoming biased, we leverage a dataset and model-agnostic certainty and diversity weighting. In order to maintain generalization and prevent catastrophic forgetting, we propose to continually weight-average the source and adapted model. To compensate for disparities in the class prior during test-time, we propose an adaptive prior correction scheme that reweights the model's predictions. We evaluate our approach, named ROID, on a wide range of settings, datasets, and models, setting new standards in the field of universal TTA. Code is available at: <a class="link-external link-https" href="https://github.com/mariodoebler/test-time-adaptation" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to enable the model to continuously adapt to distribution shifts during the test - time while maintaining good performance under multiple environmental conditions. Specifically, the paper focuses on the performance of online Test - Time Adaptation (TTA) methods in the face of domain non - stationarity and temporal correlation. The authors point out that existing TTA methods are usually only for specific settings and ignore various scenarios that may be encountered in practical applications. Therefore, they propose a method named ROID, aiming to achieve general - purpose TTA, that is, being able to adapt to any domain and performing well on a wide range of settings, datasets and models. Several major challenges mentioned in the paper include: 1. **Model Bias and the Emergence of Simple Solutions**: When entropy is minimized under different sequence lengths, self - training - based methods may cause the model to be biased towards certain classes, especially in the presence of multiple domain shifts. 2. **Loss of Generalization Ability**: As the adaptation process progresses, the model may gradually lose its generalization ability for unseen data. In particular, when encountering multiple domain shifts, catastrophic forgetting may occur. 3. **Performance Degradation Due to Changes in Class Priors**: Changes in class priors during the test phase will affect the prediction performance of the model. To overcome these challenges, the authors propose a series of techniques, including: - **Deterministic and Diversity Weighting**: By introducing deterministic and diversity weighting independent of datasets and models, the model is prevented from being biased towards certain classes. - **Weight Integration**: By continuously performing a weighted average of the weights of the source model and the adapted model, the generalization ability is maintained and catastrophic forgetting is prevented. - **Prior Correction**: By introducing an adaptive weighted smoothing scheme to readjust the model's predictions to compensate for changes in class priors during the test phase. Overall, the goal of this paper is to provide a general - purpose TTA method that can work effectively in various practical scenarios, thereby improving the robustness and adaptability of the model in the real world.

Universal Test-time Adaptation through Weight Ensembling, Diversity Weighting, and Prior Correction

On Pitfalls of Test-Time Adaptation

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

Unraveling Batch Normalization for Realistic Test-Time Adaptation

UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

Diversity-aware Buffer for Coping with Temporally Correlated Data Streams in Online Test-time Adaptation

Robust Test-Time Adaptation in Dynamic Scenarios

Generalized Robust Test-Time Adaptation in Continuous Dynamic Scenarios

Reliable Test-Time Adaptation via Agreement-on-the-Line

Improved Test-Time Adaptation for Domain Generalization

Towards Real-World Test-Time Adaptation: Tri-net Self-Training with Balanced Normalization

DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World

Robust gradient aware and reliable entropy minimization for stable test-time adaptation in dynamic scenarios

AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation

Bag of Tricks for Fully Test-Time Adaptation

Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting

Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams

Robust Mean Teacher for Continual and Gradual Test-Time Adaptation

Confidence-based and sample-reweighted test-time adaptation

Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach

Test-time Adaptation in the Dynamic World with Compound Domain Knowledge Management