Abstract:The diverse and high-quality content generated by recent generative models demonstrates the great potential of using synthetic data to train downstream models. However, in vision, especially in objection detection, related areas are not fully explored, the synthetic images are merely used to balance the long tails of existing datasets, and the accuracy of the generated labels is low, the full potential of generative models has not been exploited. In this paper, we propose DODA, a data synthesizer that can generate high-quality object detection data for new domains in agriculture. Specifically, we improve the controllability of layout-to-image through encoding layout as an image, thereby improving the quality of labels, and use a visual encoder to provide visual clues for the diffusion model to decouple visual features from the diffusion model, and empowering the model the ability to generate data in new domains. On the Global Wheat Head Detection (GWHD) Dataset, which is the largest dataset in agriculture and contains diverse domains, using the data synthesized by DODA improves the performance of the object detector by 12.74-17.76 AP$_{50}$ in the domain that was significantly shifted from the training data.

What problem does this paper attempt to address?

The paper attempts to address the issue in object detection within the agricultural domain, where domain shift between training data and real-world application scenarios leads to a significant drop in the performance of existing models in new scenarios. Specifically, the paper focuses on how to generate high-quality synthetic data to improve the performance of object detection models in unseen new domains. ### Background 1. **Domain Shift Issue**: One of the main challenges in object detection in agricultural scenarios is the difference between training data and real-world application scenarios. This difference can lead to a significant drop in model performance in actual applications. 2. **Limitations of Existing Methods**: Existing methods, such as semi-supervised learning and pseudo-label generation, can alleviate the domain shift problem to some extent, but their effectiveness is limited when the domain shift is too strong. 3. **Potential of Generative Models**: In recent years, generative models (especially diffusion models) have made significant progress in image generation, but their application in object detection tasks is still insufficient, particularly in generating high-quality labels. ### Solution The paper proposes a method called DODA (Diffusion for Object-detection Domain Adaptation) to address the above issues by generating high-quality synthetic data. Specifically: 1. **Improved Layout-to-Image Generation**: By encoding the layout as an image rather than text, the quality of generated images and the accuracy of labels are improved. 2. **Introduction of Visual Encoder**: Using a pre-trained visual encoder to provide image features allows the diffusion model to generate domain-specific images without retraining. 3. **Domain Adaptation Design**: By decoupling image features and the core components of the model, the generative model can generate images in completely new domains without additional training. ### Main Contributions 1. **Layout Encoding in Image Form**: Experimental results on the COCO dataset show that this method outperforms existing layout-to-image generation methods in terms of label accuracy, achieving a new state-of-the-art level. 2. **Decoupled Design for Domain Adaptation**: By decoupling image features and the core components of the model, the generative model can generate images in completely new domains without additional training. Fine-tuning with data generated by DODA significantly improves the performance of object detectors of different sizes and architectures. 3. **Asymmetric Data Training**: It is found that pre-training with more unlabeled data can improve the model's feature combination ability, resulting in better performance in downstream tasks. ### Experimental Validation 1. **Comparison with Text-to-Image Methods**: Quantitative comparison results on the COCO dataset show that DODA outperforms existing text-to-image methods on multiple metrics, especially in generating small objects and retaining details. 2. **Validation of Object Detection Domain Adaptation**: On the GWHD dataset, fine-tuning multiple object detectors with synthetic data generated by DODA significantly improves the performance of all models in the 'Terraref' domain, indicating that DODA can effectively extract domain-specific representations and convert them into knowledge that object detection models can utilize. ### Conclusion DODA effectively addresses the domain shift issue in object detection within the agricultural domain by generating high-quality synthetic data, significantly improving model performance in new domains. The method excels in image generation quality, label accuracy, and domain adaptation capability, showing broad application prospects.

DODA: Diffusion for Object-detection Domain Adaptation in Agriculture

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture

AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation

Deep Data Augmentation for Weed Recognition Enhancement: A Diffusion Probabilistic Model and Transfer Learning Based Approach

DAVIS-Ag: A Synthetic Plant Dataset for Prototyping Domain-Inspired Active Vision in Agricultural Robots

Domain Adaptation of Synthetic Images for Wheat Head Detection

Multi-source-free Domain Adaptive Object Detection

An Introduction to Domain Adaptive Object Detection from Synthesis to Reality

D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection

Semi-Self-Supervised Domain Adaptation: Developing Deep Learning Models with Limited Annotated Data for Wheat Head Segmentation

Exploring the Effectiveness of Dataset Synthesis: An application of Apple Detection in Orchards

Unsupervised Adversarial Domain Adaptation for Agricultural Land Extraction of Remote Sensing Images

Synthetic data augmentation by diffusion probabilistic models to enhance weed recognition

The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models

Can OOD Object Detectors Learn from Foundation Models?

Synthetic Data Augmentation Using Multiscale Attention CycleGAN for Aircraft Detection in Remote Sensing Images

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

AWADA: Attention-Weighted Adversarial Domain Adaptation for Object Detection

GDDA: Semantic OOD Detection on Graphs under Covariate Shift via Score-Based Diffusion Models

RodNet: An Advanced Multi-Domain Object Detection Approach using Feature Transformation with Generative Adversarial Networks