Abstract:Machine learning algorithms have achieved remarkable success across various disciplines, use cases and applications, under the prevailing assumption that training and test samples are drawn from the same distribution. Consequently, these algorithms struggle and become brittle even when samples in the test distribution start to deviate from the ones observed during training. Domain adaptation and domain generalization have been studied extensively as approaches to address distribution shifts across test and train domains, but each has its limitations. Test-time adaptation, a recently emerging learning paradigm, combines the benefits of domain adaptation and domain generalization by training models only on source data and adapting them to target data during test-time inference. In this survey, we provide a comprehensive and systematic review on test-time adaptation, covering more than 400 recent papers. We structure our review by categorizing existing methods into five distinct categories based on what component of the method is adjusted for test-time adaptation: the model, the inference, the normalization, the sample, or the prompt, providing detailed analysis of each. We further discuss the various preparation and adaptation settings for methods within these categories, offering deeper insights into the effective deployment for the evaluation of distribution shifts and their real-world application in understanding images, video and 3D, as well as modalities beyond vision. We close the survey with an outlook on emerging research opportunities for test-time adaptation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the poor performance of machine - learning models during the testing phase when encountering distribution shifts. Specifically, many machine - learning algorithms assume that the training data and the testing data come from the same distribution, but in practical applications, this assumption is often not valid. For example, during the inference process, situations such as noisy sensor recordings, sudden changes in weather conditions, the evolution of user needs, or completely new and unforeseen targets may be encountered, all of which can cause the distribution of the testing data to be different from that of the training data. When the distribution of the testing data begins to deviate from that of the training data, the performance of the model will decline significantly and may even become fragile. To solve this problem, the paper explores a new paradigm called "test - time adaptation". The goal of test - time adaptation is to fine - tune the model during the testing phase to reduce the negative impact caused by the distribution difference between the training data and the testing data. Specifically, test - time adaptation methods train the model using only the source data during the training phase, and during the testing phase, adjust the model parameters or the representation of the testing data through a small amount or no - label target data, thereby enhancing the performance and robustness of the model on specific test samples. ### Main Problem Definition 1. **Distribution Shift**: Test - time adaptation mainly focuses on solving the distribution shift problem in machine - learning algorithms. Specifically, it is manifested as: \[ p(x_t, y_t)\neq p(x_s, y_s) \] where \(p(x_s, y_s)\) is the source distribution and \(p(x_t, y_t)\) is the target distribution. This inconsistency will lead to the problem of inaccurate predictions when the source - trained model \(f_{\theta_s}\) is applied to the target - distribution data. 2. **Four Common Types of Distribution Shifts**: - **Covariate Shifts**: Only the input space \(p(x)\) changes, while the labels for the given input features remain unchanged. \[ p(x_t)\neq p(x_s),\quad p(y_t|x_t) = p(y_s|x_s) \] - **Label Shifts**: Only the label space \(p(y)\) changes, while the data distribution for the given labels remains unchanged. \[ p(y_t)\neq p(y_s),\quad p(x_t|y_t) = p(x_s|y_s) \] - **Concept Shifts**: The input distribution is the same, but the conditional distribution changes, such as noisy labels or different annotation methods. \[ p(x_t) = p(x_s),\quad p(y_t|x_t)\neq p(y_s|x_s) \] - **Conditional Shifts**: The label space remains unchanged, but the distribution of the input samples changes according to the labels. \[ p(y_t) = p(y_s),\quad p(x_t|y_t)\neq p(x_s|y_s) \] 3. **Test - time Adaptation**: Given the labeled source distribution \(S\) and the unlabeled target distribution \(T\), test - time adaptation aims to train the model \(f_{\theta_s}\) based only on the source distribution \(S\), and adapt during the testing phase through the source - trained model \(f_{\theta_s}\) and the target data \(x_t\) so as to make predictions on \(x_t\) after adaptation. The adaptation process can be carried out in an online or batch manner without a large amount of target data. ### Difference from Related Problems - **Domain Adaptation**: Narrow the domain gap by accessing the source and target data, but assume that the target data is available during training. - **Domain Generalization**: Avoid the need during training.

Beyond Model Adaptation at Test Time: A Survey

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

DomainAdaptor: A Novel Approach to Test-time Adaptation

Few-shot Adaptation of Multi-modal Foundation Models: A Survey

Test-time Adaptation in the Dynamic World with Compound Domain Knowledge Management

A survey on domain adaptation theory: learning bounds and theoretical guarantees

In Search of Lost Online Test-time Adaptation: A Survey

Test-Time Adaptation for Depth Completion

Survey of Computerized Adaptive Testing: A Machine Learning Perspective

Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification

Back to the Source: Diffusion-Driven Test-Time Adaptation

Unsupervised Domain Adaptation: from Simulation Engine to the RealWorld

Multi-source Fully Test-Time Adaptation

Universal Test-time Adaptation through Weight Ensembling, Diversity Weighting, and Prior Correction

Learning Instance-Specific Adaptation for Cross-Domain Segmentation

Methods for Deep Learning Model Failure Detection and Model Adaption: A Survey

Temporal Test-Time Adaptation with State-Space Models

Transfer Adaptation Learning: A Decade Survey

Fully Test-time Adaptation by Entropy Minimization

Energy-Based Test Sample Adaptation for Domain Generalization

ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series Data