Abstract:Computer vision tasks such as object detection and segmentation rely on the availability of extensive, accurately annotated datasets. In this work, We present CIA, a modular pipeline, for (1) generating synthetic images for dataset augmentation using Stable Diffusion, (2) filtering out low quality samples using defined quality metrics, (3) forcing the existence of specific patterns in generated images using accurate prompting and ControlNet. In order to show how CIA can be used to search for an optimal augmentation pipeline of training data, we study human object detection in a data constrained scenario, using YOLOv8n on COCO and Flickr30k datasets. We have recorded significant improvement using CIA-generated images, approaching the performances obtained when doubling the amount of real images in the dataset. Our findings suggest that our modular framework can significantly enhance object detection systems, and make it possible for future research to be done on data-constrained scenarios. The framework is available at: <a class="link-external link-http" href="http://github.com/multitel-ai/CIA" rel="external noopener nofollow">this http URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of the lack of high - quality and accurately - labeled datasets in computer vision tasks such as object detection and segmentation. Specifically, the authors propose a controllable image - enhancement framework named CIA to generate synthetic images for dataset expansion. The paper mainly focuses on the following aspects: 1. **Generating synthetic images**: Use the Stable Diffusion model to generate new images to increase the diversity and quantity of the dataset. 2. **Filtering low - quality samples**: Screen out low - quality synthetic images through the defined quality - assessment metrics to ensure the quality of the data used for training. 3. **Controlling the generation process**: Use ControlNet and precise prompt words to introduce specific patterns or features into the generated images to meet the requirements of specific tasks. To verify the effectiveness of CIA, the authors studied the human - object - detection task under data - limited conditions and conducted experiments on the COCO and Flickr30k datasets using the YOLOv8n model. The results show that the synthetic images generated by CIA significantly improve the model performance, approaching the effect of doubling the number of real images. ### Core problems of the paper - **Data scarcity and high - labeling cost**: It is difficult to create high - quality and accurately - labeled datasets, resulting in insufficient data volume during model training and affecting model performance. - **Limitations of traditional data - enhancement methods**: Traditional data - enhancement methods (such as rotation, flipping, color adjustment, etc.) can only perform simple transformations and cannot introduce completely new information. - **How to assess the quality of synthetic images**: When generating synthetic images, effective methods are required to assess their quality and relevance to ensure that these images are helpful for model training. ### Solutions The CIA framework solves the above problems through the following modules: 1. **Extraction module**: Extract features from the original images to maintain the intrinsic characteristics of the dataset. 2. **Generation module**: Combine the extracted features and text prompts to generate new images. 3. **Quality - assessment module**: Use pre - defined quality - assessment metrics to screen high - quality synthetic images. 4. **Training and testing module**: Explore the impact of different combinations of original and synthetic data on task performance by training different models. Through these modules, the CIA framework can effectively generate high - quality synthetic images and significantly improve the performance of object - detection models under data - limited conditions.

CIA: Controllable Image Augmentation Framework Based on Stable Diffusion

ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge

CamDiff: Camouflage Image Augmentation via Diffusion Model

Stable Diffusion for Data Augmentation in COCO and Weed Datasets

Diffusion-based Data Augmentation for Object Counting Problems

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

A Simple Background Augmentation Method for Object Detection with Diffusion Model

Generating Images with 3D Annotations Using Diffusion Models

TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces

CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point Cloud

CD-COCO: A Versatile Complex Distorted COCO Database for Scene-Context-Aware Computer Vision

Cocktail: Mixing Multi-Modality Control for Text-Conditional Image Generation

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

Not All Steps Are Created Equal: Selective Diffusion Distillation for Image Manipulation

MAISI: Medical AI for Synthetic Imaging

Stable Diffusion For Aerial Object Detection

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior