Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

Hao Li,Wei Wang,Cong Wang,Zhigang Luo,Xinwang Liu,Kenli Li,Xiaochun Cao
2024-02-05
Abstract:Single-domain generalized object detection aims to enhance a model's generalizability to multiple unseen target domains using only data from a single source domain during training. This is a practical yet challenging task as it requires the model to address domain shift without incorporating target domain data into training. In this paper, we propose a novel phrase grounding-based style transfer (PGST) approach for the task. Specifically, we first define textual prompts to describe potential objects for each unseen target domain. Then, we leverage the grounded language-image pre-training (GLIP) model to learn the style of these target domains and achieve style transfer from the source to the target domain. The style-transferred source visual features are semantically rich and could be close to imaginary counterparts in the target domain. Finally, we employ these style-transferred visual features to fine-tune GLIP. By introducing imaginary counterparts, the detector could be effectively generalized to unseen target domains using only a single source domain for training. Extensive experimental results on five diverse weather driving benchmarks demonstrate our proposed approach achieves state-of-the-art performance, even surpassing some domain adaptive methods that incorporate target domain images into the training process.The source codes and pre-trained models will be made available.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of Single-Domain Generalized Object Detection. Specifically: 1. **Single-Domain Generalized Object Detection Task**: - The goal of this task is to use data from a single source domain during training, enabling the model to generalize to multiple unseen target domains. This task is practically significant but highly challenging because it requires the model to handle domain shifts without having access to target domain data. 2. **Proposed New Method**: - The paper proposes a Phrase Grounding-based Style Transfer (PGST) method. By defining text prompts to describe potential objects in each unseen target domain and utilizing the GLIP model, the method achieves style transfer from the source domain to the target domain. This approach allows the visual features of the source domain to approximate their imagined counterparts in the target domain while preserving semantic information. 3. **Experimental Validation**: - Extensive experiments were conducted on 5 different weather driving benchmarks, and the results show that this method significantly improves mean Average Precision (mAP), even surpassing some domain adaptation methods that include target domain images.