Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

Hao Li,Wei Wang,Cong Wang,Zhigang Luo,Xinwang Liu,Kenli Li,Xiaochun Cao

2024-02-05

Abstract:Single-domain generalized object detection aims to enhance a model's generalizability to multiple unseen target domains using only data from a single source domain during training. This is a practical yet challenging task as it requires the model to address domain shift without incorporating target domain data into training. In this paper, we propose a novel phrase grounding-based style transfer (PGST) approach for the task. Specifically, we first define textual prompts to describe potential objects for each unseen target domain. Then, we leverage the grounded language-image pre-training (GLIP) model to learn the style of these target domains and achieve style transfer from the source to the target domain. The style-transferred source visual features are semantically rich and could be close to imaginary counterparts in the target domain. Finally, we employ these style-transferred visual features to fine-tune GLIP. By introducing imaginary counterparts, the detector could be effectively generalized to unseen target domains using only a single source domain for training. Extensive experimental results on five diverse weather driving benchmarks demonstrate our proposed approach achieves state-of-the-art performance, even surpassing some domain adaptive methods that incorporate target domain images into the training process.The source codes and pre-trained models will be made available.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the issue of Single-Domain Generalized Object Detection. Specifically: 1. **Single-Domain Generalized Object Detection Task**: - The goal of this task is to use data from a single source domain during training, enabling the model to generalize to multiple unseen target domains. This task is practically significant but highly challenging because it requires the model to handle domain shifts without having access to target domain data. 2. **Proposed New Method**: - The paper proposes a Phrase Grounding-based Style Transfer (PGST) method. By defining text prompts to describe potential objects in each unseen target domain and utilizing the GLIP model, the method achieves style transfer from the source domain to the target domain. This approach allows the visual features of the source domain to approximate their imagined counterparts in the target domain while preserving semantic information. 3. **Experimental Validation**: - Extensive experiments were conducted on 5 different weather driving benchmarks, and the results show that this method significantly improves mean Average Precision (mAP), even surpassing some domain adaptation methods that include target domain images.

Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

UATST: Towards Unpaired Arbitrary Text-Guided Style Transfer with Cross-Space Modulation

Single Domain Generalization for Scene Classification Using Style-Oriented Data Augmentation

GOOD: Towards Domain Generalized Orientated Object Detection

Achieving Domain Generalization in Underwater Object Detection by Image Stylization and Domain Mixup.

Domain Adaptation for Object Detection via Style Consistency

Style-Guided Adversarial Teacher for Cross-Domain Object Detection

StylePrompter: Enhancing Domain Generalization with Test-Time Style Priors

Self-Training-Based Unsupervised Domain Adaptation for Object Detection in Remote Sensing Imagery

Multi-Task Domain Adaptation for Language Grounding with 3D Objects

Domain-Robust Mitotic Figure Detection with Style Transfer

Towards Domain Generalization in Object Detection

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Domain-Aware Universal Style Transfer

Domain Generalization of 3D Object Detection by Density-Resampling

Object-Aware Domain Generalization for Object Detection

A Step-Wise Domain Adaptation Detection Transformer for Object Detection under Poor Visibility Conditions

Language-Aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification

Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

Frequency-based pseudo-domain generation for domain generalizable object detection