Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework

Weixi Weng,Chun Yuan
DOI: https://doi.org/10.1609/aaai.v38i6.28405
2024-01-01
Abstract:Unsupervised domain adaptation object detection (UDAOD) research on DetectionTransformer(DETR) mainly focuses on feature alignment and existing methods canbe divided into two kinds, each of which has its unresolved issues. One-stagefeature alignment methods can easily lead to performance fluctuation andtraining stagnation. Two-stage feature alignment method based on mean teachercomprises a pretraining stage followed by a self-training stage, each facingproblems in obtaining reliable pretrained model and achieving consistentperformance gains. Methods mentioned above have not yet explore how to utilizethe third related domain such as target-like domain to assist adaptation. Toaddress these issues, we propose a two-stage framework named MTM, i.e. MeanTeacher-DETR with Masked Feature Alignment. In the pretraining stage, weutilize labeled target-like images produced by image style transfer to avoidperformance fluctuation. In the self-training stage, we leverage unlabeledtarget images by pseudo labels based on mean teacher and propose a modulecalled Object Queries Knowledge Transfer (OQKT) to ensure consistentperformance gains of the student model. Most importantly, we propose maskedfeature alignment methods including Masked Domain Query-based Feature Alignment(MDQFA) and Masked Token-wise Feature Alignment (MTWFA) to alleviate domainshift in a more robust way, which not only prevent training stagnation and leadto a robust pretrained model in the pretraining stage, but also enhance themodel's target performance in the self-training stage. Experiments on threechallenging scenarios and a theoretical analysis verify the effectiveness ofMTM.
What problem does this paper attempt to address?