Object Style Diffusion for Generalized Object Detection in Urban Scene

Hao Li,Xiangyuan Yang,Mengzhu Wang,Long Lan,Ke Liang,Xinwang Liu,Kenli Li
2024-12-18
Abstract:Object detection is a critical task in computer vision, with applications in various domains such as autonomous driving and urban scene monitoring. However, deep learning-based approaches often demand large volumes of annotated data, which are costly and difficult to acquire, particularly in complex and unpredictable real-world environments. This dependency significantly hampers the generalization capability of existing object detection techniques. To address this issue, we introduce a novel single-domain object detection generalization method, named GoDiff, which leverages a pre-trained model to enhance generalization in unseen domains. Central to our approach is the Pseudo Target Data Generation (PTDG) module, which employs a latent diffusion model to generate pseudo-target domain data that preserves source domain characteristics while introducing stylistic variations. By integrating this pseudo data with source domain data, we diversify the training dataset. Furthermore, we introduce a cross-style instance normalization technique to blend style features from different domains generated by the PTDG module, thereby increasing the detector's robustness. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods, achieving state-of-the-art performance in autonomous driving scenarios.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of **insufficient generalization ability of object detection models in unseen scenarios**. Specifically, existing deep - learning methods usually require a large amount of labeled data to train object detection models, and this data is difficult to obtain and costly in complex and unpredictable real - world environments. This dependence on a large amount of labeled data severely limits the generalization ability of existing object detection techniques. To meet this challenge, the authors propose a new single - domain object detection generalization method called **GoDiff**. This method improves the generalization ability of the model in unseen domains through the following means: 1. **Pseudo Target Data Generation (PTDG) module**: Use a pre - trained diffusion model to generate pseudo - target - domain data, which retains the characteristics of the source domain while introducing style changes. By combining these pseudo - data with the source - domain data, the training data set can be enriched and the diversity of the model can be increased. 2. **Cross - Style Instance Normalization (CSN) technique**: By exchanging the style characteristics between data generated in different domains, the robustness of the model is further enhanced and its generalization ability is improved. 3. **Dual - prompt Strategy**: Guide the diffusion model to generate diverse virtual images through global and local prompts, ensuring that the generated images have both style diversity and semantic consistency. 4. **Object Filtering Mechanism**: Use the CLIP - RBF kernel to evaluate the semantic differences between generated images and real images, ensuring the removal of low - quality objects and improving the quality of the data set. 5. **Covariance Matching Loss (CML)**: Minimize the differences between the covariance matrices of different style characteristics, ensuring that the model can learn style - invariant features. The experimental results show that GoDiff not only enhances the generalization ability of existing detectors but can also be used as a plug - and - play enhancement tool for other single - domain generalization methods, achieving state - of - the - art performance in the autonomous driving scenario. In summary, this paper mainly solves the problem of insufficient generalization ability of object detection models in unseen scenarios and proposes an innovative method to generate diverse and high - quality training data, thereby significantly improving the generalization performance of the model.