DODA: Diffusion for Object-detection Domain Adaptation in Agriculture

Shuai Xiang,Pieter M. Blok,James Burridge,Haozhou Wang,Wei Guo
2024-03-27
Abstract:The diverse and high-quality content generated by recent generative models demonstrates the great potential of using synthetic data to train downstream models. However, in vision, especially in objection detection, related areas are not fully explored, the synthetic images are merely used to balance the long tails of existing datasets, and the accuracy of the generated labels is low, the full potential of generative models has not been exploited. In this paper, we propose DODA, a data synthesizer that can generate high-quality object detection data for new domains in agriculture. Specifically, we improve the controllability of layout-to-image through encoding layout as an image, thereby improving the quality of labels, and use a visual encoder to provide visual clues for the diffusion model to decouple visual features from the diffusion model, and empowering the model the ability to generate data in new domains. On the Global Wheat Head Detection (GWHD) Dataset, which is the largest dataset in agriculture and contains diverse domains, using the data synthesized by DODA improves the performance of the object detector by 12.74-17.76 AP$_{50}$ in the domain that was significantly shifted from the training data.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue in object detection within the agricultural domain, where domain shift between training data and real-world application scenarios leads to a significant drop in the performance of existing models in new scenarios. Specifically, the paper focuses on how to generate high-quality synthetic data to improve the performance of object detection models in unseen new domains. ### Background 1. **Domain Shift Issue**: One of the main challenges in object detection in agricultural scenarios is the difference between training data and real-world application scenarios. This difference can lead to a significant drop in model performance in actual applications. 2. **Limitations of Existing Methods**: Existing methods, such as semi-supervised learning and pseudo-label generation, can alleviate the domain shift problem to some extent, but their effectiveness is limited when the domain shift is too strong. 3. **Potential of Generative Models**: In recent years, generative models (especially diffusion models) have made significant progress in image generation, but their application in object detection tasks is still insufficient, particularly in generating high-quality labels. ### Solution The paper proposes a method called DODA (Diffusion for Object-detection Domain Adaptation) to address the above issues by generating high-quality synthetic data. Specifically: 1. **Improved Layout-to-Image Generation**: By encoding the layout as an image rather than text, the quality of generated images and the accuracy of labels are improved. 2. **Introduction of Visual Encoder**: Using a pre-trained visual encoder to provide image features allows the diffusion model to generate domain-specific images without retraining. 3. **Domain Adaptation Design**: By decoupling image features and the core components of the model, the generative model can generate images in completely new domains without additional training. ### Main Contributions 1. **Layout Encoding in Image Form**: Experimental results on the COCO dataset show that this method outperforms existing layout-to-image generation methods in terms of label accuracy, achieving a new state-of-the-art level. 2. **Decoupled Design for Domain Adaptation**: By decoupling image features and the core components of the model, the generative model can generate images in completely new domains without additional training. Fine-tuning with data generated by DODA significantly improves the performance of object detectors of different sizes and architectures. 3. **Asymmetric Data Training**: It is found that pre-training with more unlabeled data can improve the model's feature combination ability, resulting in better performance in downstream tasks. ### Experimental Validation 1. **Comparison with Text-to-Image Methods**: Quantitative comparison results on the COCO dataset show that DODA outperforms existing text-to-image methods on multiple metrics, especially in generating small objects and retaining details. 2. **Validation of Object Detection Domain Adaptation**: On the GWHD dataset, fine-tuning multiple object detectors with synthetic data generated by DODA significantly improves the performance of all models in the 'Terraref' domain, indicating that DODA can effectively extract domain-specific representations and convert them into knowledge that object detection models can utilize. ### Conclusion DODA effectively addresses the domain shift issue in object detection within the agricultural domain by generating high-quality synthetic data, significantly improving model performance in new domains. The method excels in image generation quality, label accuracy, and domain adaptation capability, showing broad application prospects.