Abstract:Fully test-time adaptation aims to adapt a network model online based on sequential analysis of input samples during the inference stage. We observe that, when applying a transformer network model into a new domain, the self-attention profiles of image samples in the target domain deviate significantly from those in the source domain, which results in large performance degradation during domain changes. To address this important issue, we propose a new structure for the self-attention modules in the transformer. Specifically, we incorporate three domain-conditioning vectors, called domain conditioners, into the query, key, and value components of the self-attention module. We learn a network to generate these three domain conditioners from the class token at each transformer network layer. We find that, during fully online test-time adaptation, these domain conditioners at each transform network layer are able to gradually remove the impact of domain shift and largely recover the original self-attention profile. Our extensive experimental results demonstrate that the proposed domain-conditioned transformer significantly improves the online fully test-time domain adaptation performance and outperforms existing state-of-the-art methods by large margins.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in fully test - time adaptation (FTTA), when applying pre - trained Transformer models to new target domains, the significant performance degradation problem caused by the data distribution differences between the source domain and the target domain. Specifically, the author observes that when applying the Transformer network model to a new domain, the self - attention distribution of the target - domain image samples has a significant deviation compared to the source domain, which leads to a large performance drop during domain changes.
To solve this problem, the author proposes a new structure - **Domain - Conditioned Transformer (DCT)**, by introducing three domain condition vectors (domain conditioners) and adding them to the query, key, and value components of the self - attention module respectively. These domain condition vectors aim to capture domain - specific perturbation information and remove these perturbations layer by layer, thereby gradually restoring the original self - attention distribution and improving the model's adaptability on the new domain.
### Specific Problem Description
1. **Domain Transfer Problem**: When the Transformer model is transferred from the source domain to the target domain, due to the change in data distribution, the model's self - attention mechanism will be perturbed, resulting in performance degradation.
2. **Online Adaptation Challenge**: It is necessary to adjust the model online to adapt to the data distribution of the target domain without the source - domain data, and this adaptation is carried out in real - time, that is, relying only on the current mini - batch test samples each time.
### Solution
The Domain - Conditioned Transformer (DCT) proposed by the author mainly solves the problem in the following ways:
- **Introducing Domain Condition Vectors**: In each Transformer layer, three domain condition vectors (\[C_q, C_k, C_v\]) are introduced, and these vectors are generated by a lightweight neural network (called the domain - condition generator \(\Phi_l\)) according to the class token.
- **Removing Domain Shifts Layer by Layer**: By introducing domain condition vectors in each Transformer layer, the influence of domain shifts is gradually removed and the original self - attention distribution is restored.
- **Online Learning**: The domain - condition generator is continuously updated during the testing process to ensure that the model can adapt to the data distribution of the target domain in real - time.
### Experimental Results
The experimental results show that the proposed DCT method significantly improves the performance of domain adaptation during online testing and outperforms the existing state - of - the - art methods on multiple benchmark datasets.
### Formula Summary
- The output matrix of the self - attention mechanism is defined as:
\[
\text{Attention}(Q, K, V)=\text{softmax}\left(\frac{QK^{\top}}{\sqrt{d}}\right)V
\]
- The self - attention mechanism after introducing domain condition vectors:
\[
\bar{Q}=\begin{bmatrix}Q\\C_q\end{bmatrix}, \quad \bar{K}=\begin{bmatrix}K\\C_k\end{bmatrix}, \quad \bar{V}=\begin{bmatrix}V\\C_v\end{bmatrix}
\]
\[
\text{Attention}(\bar{Q}, \bar{K}, \bar{V})=\text{softmax}\left(\frac{\bar{Q}\bar{K}^{\top}}{\sqrt{d}}\right)\bar{V}
\]
Through these improvements, DCT can effectively adapt to new target domains during fully online testing and maintain and improve the performance of the model.