Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments

Shilei Cao,Yan Liu,Juepeng Zheng,Weijia Li,Runmin Dong,Haohuan Fu
2024-08-18
Abstract:Real-world application models are commonly deployed in dynamic environments, where the target domain distribution undergoes temporal changes. Continual Test-Time Adaptation (CTTA) has recently emerged as a promising technique to gradually adapt a source-trained model to continually changing target domains. Despite recent advancements in addressing CTTA, two critical issues remain: 1) Fixed thresholds for pseudo-labeling in existing methodologies generate low-quality pseudo-labels, as model confidence varies across categories and domains; 2) Stochastic parameter restoration methods for mitigating catastrophic forgetting fail to effectively preserve critical information due to their intrinsic randomness. To tackle these challenges for detection models in CTTA scenarios, we present CTAOD, featuring three core components. Firstly, the object-level contrastive learning module extracts object-level features for contrastive learning to refine the feature representation in the target domain. Secondly, the adaptive monitoring module dynamically skips unnecessary adaptation and updates the category-specific threshold based on predicted confidence scores to enable efficiency and improve the quality of pseudo-labels. Lastly, the data-driven stochastic restoration mechanism selectively reset inactive parameters with higher possibilities, ensuring the retention of essential knowledge. We demonstrate the effectiveness of CTAOD on four CTTA object detection tasks, where CTAOD outperforms existing methods, especially achieving a 3.2 mAP improvement and a 20% increase in efficiency on the Cityscapes-to-Cityscapes-C CTTA task. The code will be released.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how the object detection model adapts at test - time in a continuously changing environment (Continual Test - Time Adaptation, CTTA). Specifically, the paper mainly focuses on the following two challenges: 1. **Error accumulation caused by low - quality pseudo - labels**: Existing methods use a fixed threshold for pseudo - label generation, which will lead to the neglect of the model confidence differences between different classes and domains, thus generating low - quality pseudo - labels. These low - quality pseudo - labels will introduce errors and cause the model performance to decline. 2. **Effectively retaining current - domain knowledge and alleviating source - knowledge forgetting in a dynamic environment**: Existing methods are prone to catastrophic forgetting when facing continuous distribution changes, that is, the model forgets the previously learned knowledge and it is difficult to effectively retain the important information of the current domain. To solve the above problems, the authors propose CTAOD (Continual Test - time Adaption for Object Detection), which contains three core components: - **Object - level Contrastive Learning (OCL)**: By extracting object - level features for contrastive learning to improve the feature representation in the target domain. - **Adaptive Monitoring (AM)**: Dynamically skip unnecessary adaptation processes and update the class - specific thresholds according to the predicted confidence scores to improve the quality and efficiency of pseudo - labels. - **Data - driven Stochastic Restoration (DSR)**: Selectively reset inactive parameters to ensure the retention of key knowledge and enhance the robustness to forgetting. ### Formula Summary 1. **Contrastive learning loss function**: \[ L_{cl}(x_t)=-\frac{1}{l}\sum_{i = 1}^{l}\log\frac{\exp(f_T^i\cdot f_S^i/\tau)}{\sum_{j = 1}^{l}\exp(f_T^i\cdot f_S^j/\tau)} \] where \(f_T^i\) and \(f_S^i\) are the features extracted by the teacher and student models from different augmented views of the same object respectively, \(\tau\) is the temperature parameter, and \(l\) is the number of features. 2. **KL - divergence loss function**: \[ L_{kl}(P\|Q)=\sum_{x\in X}P(x)\log\frac{P(x)}{Q(x)} \] which is used to quantify the difference between two probability distributions. 3. **Exponential moving average update of the Adaptive Monitoring module**: \[ l_{ema}^t\leftarrow\beta_s\cdot l_{ema}^{t - 1}+(1-\beta_s)\cdot l_t \] where \(\beta_s\) is the update rate and \(l_t\) is the average prediction score of all classes. 4. **Dynamic threshold update formula**: \[ \delta_c^t\leftarrow\beta_t\cdot\delta_c^{t - 1}+(1-\beta_t)\cdot\epsilon\cdot(l_c^t)^{1/2} \] where \(\delta_c^t\) is the threshold of class \(c\) at time step \(t\), \(\beta_t\) is the update rate, \(\epsilon\) provides a linear projection, and \(l_c^t\) is the average prediction score of class \(c\). 5. **Data - driven Stochastic Restoration mechanism**: \[ M_t = F_t<\eta \]