Abstract:Test-Time Adaptation (TTA) aims to adapt pre-trained models to the target domain during testing. In reality, this adaptability can be influenced by multiple factors. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges, such as dealing with continual domain shifts, mixed domains, and temporally correlated or imbalanced class distributions. Despite these efforts, a unified and comprehensive benchmark has yet to be established. To this end, we propose a Unified Test-Time Adaptation (UniTTA) benchmark, which is comprehensive and widely applicable. Each scenario within the benchmark is fully described by a Markov state transition matrix for sampling from the original dataset. The UniTTA benchmark considers both domain and class as two independent dimensions of data and addresses various combinations of imbalance/balance and i.i.d./non-i.i.d./continual conditions, covering a total of \( (2 \times 3)^2 = 36 \) scenarios. It establishes a comprehensive evaluation benchmark for realistic TTA and provides a guideline for practitioners to select the most suitable TTA method. Alongside this benchmark, we propose a versatile UniTTA framework, which includes a Balanced Domain Normalization (BDN) layer and a COrrelated Feature Adaptation (COFA) method--designed to mitigate distribution gaps in domain and class, respectively. Extensive experiments demonstrate that our UniTTA framework excels within the UniTTA benchmark and achieves state-of-the-art performance on average. Our code is available at \url{<a class="link-external link-https" href="https://github.com/LeapLabTHU/UniTTA" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

This paper primarily addresses the challenges faced by Test-Time Adaptation (TTA) methods in real-world applications by proposing a unified and comprehensive evaluation benchmark (UniTTA benchmark) and a versatile framework (UniTTA framework) aimed at addressing the issue of data distribution shifts in practical scenarios. ### Main Problems Addressed 1. **Lack of a Unified Evaluation Benchmark**: Although existing TTA research has considered various complex scenarios (such as continuous domain shifts, mixed domains, time-related data, etc.), there is a lack of a unified and comprehensive benchmark to evaluate the effectiveness of these methods. 2. **Handling Complex Data Distribution Shifts**: In practical applications, data may undergo various types of changes, including domain shifts, class imbalance, non-independent and identically distributed (non-i.i.d.) data, etc. Existing TTA methods often can only address specific types of challenges and are not universally applicable. ### UniTTA benchmark - **Objective**: To establish a comprehensive evaluation benchmark that covers various data distribution shifts that may be encountered in the real world. - **Implementation Method**: By defining a Markov state transition matrix to generate different types of test data streams, these data streams can simulate different domain and class shifts, thereby covering various possible scenario combinations. - **Coverage**: This benchmark considers combinations of balanced/unbalanced domains and classes, as well as independent and identically distributed (i.i.d.)/continuous shifts, covering a total of 36 different scenarios. ### UniTTA framework - **Components**: - **Balanced Domain Normalization (BDN) Layer**: Aims to mitigate issues caused by domain and class distribution differences by calculating statistics for each class and averaging them across classes to obtain balanced domain statistics. - **COrrelated Feature Adaptation (COFA) Method**: Utilizes the temporal correlation features of the data, referencing information from previous samples to improve the prediction results of the current sample, and employs a confidence filtering strategy to ensure good performance even in i.i.d. conditions. - **Advantages**: This framework demonstrates excellent performance on the UniTTA benchmark, achieving or surpassing the current state-of-the-art levels in various practical scenarios. In summary, this paper aims to provide a more comprehensive approach to evaluating and addressing the complex issues in test-time adaptation, particularly in handling real-world data distribution shifts, by proposing the UniTTA benchmark and UniTTA framework.

UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation