CIA: Controllable Image Augmentation Framework Based on Stable Diffusion

Mohamed Benkedadra,Dany Rimez,Tiffanie Godelaine,Natarajan Chidambaram,Hamed Razavi Khosroshahi,Horacio Tellez,Matei Mancas,Benoit Macq,Sidi Ahmed Mahmoudi
DOI: https://doi.org/10.1109/MIPR62202.2024.00102
2024-11-25
Abstract:Computer vision tasks such as object detection and segmentation rely on the availability of extensive, accurately annotated datasets. In this work, We present CIA, a modular pipeline, for (1) generating synthetic images for dataset augmentation using Stable Diffusion, (2) filtering out low quality samples using defined quality metrics, (3) forcing the existence of specific patterns in generated images using accurate prompting and ControlNet. In order to show how CIA can be used to search for an optimal augmentation pipeline of training data, we study human object detection in a data constrained scenario, using YOLOv8n on COCO and Flickr30k datasets. We have recorded significant improvement using CIA-generated images, approaching the performances obtained when doubling the amount of real images in the dataset. Our findings suggest that our modular framework can significantly enhance object detection systems, and make it possible for future research to be done on data-constrained scenarios. The framework is available at: <a class="link-external link-http" href="http://github.com/multitel-ai/CIA" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of the lack of high - quality and accurately - labeled datasets in computer vision tasks such as object detection and segmentation. Specifically, the authors propose a controllable image - enhancement framework named CIA to generate synthetic images for dataset expansion. The paper mainly focuses on the following aspects: 1. **Generating synthetic images**: Use the Stable Diffusion model to generate new images to increase the diversity and quantity of the dataset. 2. **Filtering low - quality samples**: Screen out low - quality synthetic images through the defined quality - assessment metrics to ensure the quality of the data used for training. 3. **Controlling the generation process**: Use ControlNet and precise prompt words to introduce specific patterns or features into the generated images to meet the requirements of specific tasks. To verify the effectiveness of CIA, the authors studied the human - object - detection task under data - limited conditions and conducted experiments on the COCO and Flickr30k datasets using the YOLOv8n model. The results show that the synthetic images generated by CIA significantly improve the model performance, approaching the effect of doubling the number of real images. ### Core problems of the paper - **Data scarcity and high - labeling cost**: It is difficult to create high - quality and accurately - labeled datasets, resulting in insufficient data volume during model training and affecting model performance. - **Limitations of traditional data - enhancement methods**: Traditional data - enhancement methods (such as rotation, flipping, color adjustment, etc.) can only perform simple transformations and cannot introduce completely new information. - **How to assess the quality of synthetic images**: When generating synthetic images, effective methods are required to assess their quality and relevance to ensure that these images are helpful for model training. ### Solutions The CIA framework solves the above problems through the following modules: 1. **Extraction module**: Extract features from the original images to maintain the intrinsic characteristics of the dataset. 2. **Generation module**: Combine the extracted features and text prompts to generate new images. 3. **Quality - assessment module**: Use pre - defined quality - assessment metrics to screen high - quality synthetic images. 4. **Training and testing module**: Explore the impact of different combinations of original and synthetic data on task performance by training different models. Through these modules, the CIA framework can effectively generate high - quality synthetic images and significantly improve the performance of object - detection models under data - limited conditions.