Abstract:Object detection is a critical task in computer vision, with applications in various domains such as autonomous driving and urban scene monitoring. However, deep learning-based approaches often demand large volumes of annotated data, which are costly and difficult to acquire, particularly in complex and unpredictable real-world environments. This dependency significantly hampers the generalization capability of existing object detection techniques. To address this issue, we introduce a novel single-domain object detection generalization method, named GoDiff, which leverages a pre-trained model to enhance generalization in unseen domains. Central to our approach is the Pseudo Target Data Generation (PTDG) module, which employs a latent diffusion model to generate pseudo-target domain data that preserves source domain characteristics while introducing stylistic variations. By integrating this pseudo data with source domain data, we diversify the training dataset. Furthermore, we introduce a cross-style instance normalization technique to blend style features from different domains generated by the PTDG module, thereby increasing the detector's robustness. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods, achieving state-of-the-art performance in autonomous driving scenarios.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of **insufficient generalization ability of object detection models in unseen scenarios**. Specifically, existing deep - learning methods usually require a large amount of labeled data to train object detection models, and this data is difficult to obtain and costly in complex and unpredictable real - world environments. This dependence on a large amount of labeled data severely limits the generalization ability of existing object detection techniques. To meet this challenge, the authors propose a new single - domain object detection generalization method called **GoDiff**. This method improves the generalization ability of the model in unseen domains through the following means: 1. **Pseudo Target Data Generation (PTDG) module**: Use a pre - trained diffusion model to generate pseudo - target - domain data, which retains the characteristics of the source domain while introducing style changes. By combining these pseudo - data with the source - domain data, the training data set can be enriched and the diversity of the model can be increased. 2. **Cross - Style Instance Normalization (CSN) technique**: By exchanging the style characteristics between data generated in different domains, the robustness of the model is further enhanced and its generalization ability is improved. 3. **Dual - prompt Strategy**: Guide the diffusion model to generate diverse virtual images through global and local prompts, ensuring that the generated images have both style diversity and semantic consistency. 4. **Object Filtering Mechanism**: Use the CLIP - RBF kernel to evaluate the semantic differences between generated images and real images, ensuring the removal of low - quality objects and improving the quality of the data set. 5. **Covariance Matching Loss (CML)**: Minimize the differences between the covariance matrices of different style characteristics, ensuring that the model can learn style - invariant features. The experimental results show that GoDiff not only enhances the generalization ability of existing detectors but can also be used as a plug - and - play enhancement tool for other single - domain generalization methods, achieving state - of - the - art performance in the autonomous driving scenario. In summary, this paper mainly solves the problem of insufficient generalization ability of object detection models in unseen scenarios and proposes an innovative method to generate diverse and high - quality training data, thereby significantly improving the generalization performance of the model.

Object Style Diffusion for Generalized Object Detection in Urban Scene

Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

DoubleAUG: Single-domain Generalized Object Detector in Urban via Color Perturbation and Dual-style Memory

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

Towards Domain Generalization in Object Detection

Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

A Simple Background Augmentation Method for Object Detection with Diffusion Model

Object-Aware Domain Generalization for Object Detection

Domain Generalization of 3D Object Detection by Density-Resampling

FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection

Detector Guidance for Multi-Object Text-to-Image Generation

GOOD: Towards Domain Generalized Orientated Object Detection

Domain Adaptation for Object Detection via Style Consistency

Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond

Achieving Domain Generalization in Underwater Object Detection by Image Stylization and Domain Mixup.