Abstract:Generalizing deep learning models to unknown target domain distribution with low latency has motivated research into test-time training/adaptation (TTT/TTA). Existing approaches often focus on improving test-time training performance under well-curated target domain data. As figured out in this work, many state-of-the-art methods fail to maintain the performance when the target domain is contaminated with strong out-of-distribution (OOD) data, a.k.a. open-world test-time training (OWTTT). The failure is mainly due to the inability to distinguish strong OOD samples from regular weak OOD samples. To improve the robustness of OWTTT we first develop an adaptive strong OOD pruning which improves the efficacy of the self-training TTT method. We further propose a way to dynamically expand the prototypes to represent strong OOD samples for an improved weak/strong OOD data separation. Finally, we regularize self-training with distribution alignment and the combination yields the state-of-the-art performance on 5 OWTTT benchmarks. The code is available at <a class="link-external link-https" href="https://github.com/Yushu-Li/OWTTT" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper primarily focuses on the challenges faced in Open-World Test-Time Training (OWTTT), especially when the target domain data is contaminated with Strong Out-of-Distribution (Strong OOD) samples. ### Research Background and Problem Definition - **Test-Time Training (TTT)**: A method that allows a pre-trained model to adapt to unknown target domain data during the inference phase without accessing source domain data. - **Problems with Existing Methods**: Many existing TTT methods perform poorly when handling Strong OOD samples in the target domain data. These samples may come from different semantic categories or just random noise, making it difficult for the model to distinguish normal Weak OOD samples. - **Specific Challenges**: - Self-Training methods struggle to handle Strong OOD samples correctly because they need to assign test samples to known categories. - Distribution alignment-based methods are also affected when Strong OOD samples are included in the estimation of the target domain distribution. ### Solution Overview The paper proposes a two-stage method to improve the robustness of OWTTT: 1. **Strong OOD Sample Pruning**: - Proposes a method to identify and exclude Strong OOD samples without requiring hyperparameters, reducing their negative impact on the self-training process. - Uses a dynamic threshold to distinguish between Strong OOD and Weak OOD samples. 2. **Prototype Expansion**: - Dynamically expands the prototype pool to include new prototypes representing Strong OOD samples. - This allows Strong OOD samples to form tighter clusters in the feature space, better separating Weak OOD samples from Strong OOD samples. Additionally, the paper incorporates distribution alignment as a regularization term to further enhance the model's robustness and proposes a benchmark covering multiple types of domain shifts to evaluate the OWTTT protocol. ### Main Contributions - Identifies an important issue overlooked in existing TTT research—OWTTT may fail in the presence of Strong OOD samples. - Proposes a prototype clustering-based baseline method and develops a Strong OOD detector and prototype expansion technique to improve robustness under OWTTT. - Establishes a benchmark covering various types of domain shifts, including common corruptions and style transfers, achieving state-of-the-art performance on the proposed benchmark.

On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion

Open-World Test-Time Training: Self-Training with Contrast Learning

Robust Test-Time Adaptation in Dynamic Scenarios

Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering Regularized Self-Training

STFAR: Improving Object Detection Robustness at Test-Time by Self-Training with Feature Alignment Regularization

Universal Test-time Adaptation through Weight Ensembling, Diversity Weighting, and Prior Correction

Towards Stable Test-time Adaptation in Dynamic Wild World

Generalized Robust Test-Time Adaptation in Continuous Dynamic Scenarios

ODS: Test-Time Adaptation in the Presence of Open-World Data Shift.

On Pitfalls of Test-Time Adaptation

Towards Real-World Test-Time Adaptation: Tri-net Self-Training with Balanced Normalization

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

Test-Time Training on Video Streams

Improved Test-Time Adaptation for Domain Generalization

Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study

Deep Active Learning in the Open World

Robust Question Answering Against Distribution Shifts with Test-Time Adaption: an Empirical Study

SoTTA: Robust Test-Time Adaptation on Noisy Data Streams

Test-time Adaptation in the Dynamic World with Compound Domain Knowledge Management

Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments