The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models

Roy Voetman,Maya Aghaei,Klaas Dijkstra
2023-06-16
Abstract:Despite the notable accomplishments of deep object detection models, a major challenge that persists is the requirement for extensive amounts of training data. The process of procuring such real-world data is a laborious undertaking, which has prompted researchers to explore new avenues of research, such as synthetic data generation techniques. This study presents a framework for the generation of synthetic datasets by fine-tuning pretrained stable diffusion models. The synthetic datasets are then manually annotated and employed for training various object detection models. These detectors are evaluated on a real-world test set of 331 images and compared against a baseline model that was trained on real-world images. The results of this study reveal that the object detection models trained on synthetic data perform similarly to the baseline model. In the context of apple detection in orchards, the average precision deviation with the baseline ranges from 0.09 to 0.12. This study illustrates the potential of synthetic data generation techniques as a viable alternative to the collection of extensive training data for the training of deep models.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue of the need for large amounts of training data in deep object detection model training. Acquiring this real-world data is a time-consuming and labor-intensive process, so researchers are exploring new research directions, such as synthetic data generation techniques. Specifically, the paper proposes a framework that generates synthetic datasets by fine-tuning a pre-trained stable diffusion model and uses these synthetic datasets to train various object detection models. To validate the effectiveness of this method, the study selected apple detection as the experimental task and tested it using an established benchmark dataset. The results show that the performance of object detection models trained on synthetic data is comparable to baseline models trained on real data. Particularly in the task of apple detection in orchards, the average precision deviation is between 0.09 and 0.12 compared to the baseline model. This indicates that synthetic data generation techniques can serve as an effective alternative, reducing the need for large-scale real datasets.