Abstract:Data augmentation is a widely used technique for creating training data for tasks that require labeled data, such as semantic segmentation. This method benefits pixel-wise annotation tasks requiring much effort and intensive labor. Traditional data augmentation methods involve simple transformations like rotations and flips to create new images from existing ones. However, these new images may lack diversity along the main semantic axes in the data and not change high-level semantic properties. To address this issue, generative models have emerged as an effective solution for augmenting data by generating synthetic images. Controllable generative models offer a way to augment data for semantic segmentation tasks using a prompt and visual reference from the original image. However, using these models directly presents challenges, such as creating an effective prompt and visual reference to generate a synthetic image that accurately reflects the content and structure of the original. In this work, we introduce an effective data augmentation method for semantic segmentation using the Controllable Diffusion Model. Our proposed method includes efficient prompt generation using Class-Prompt Appending and Visual Prior Combination to enhance attention to labeled classes in real images. These techniques allow us to generate images that accurately depict segmented classes in the real image. In addition, we employ the class balancing algorithm to ensure efficiency when merging the synthetic and original images to generate balanced data for the training dataset. We evaluated our method on the PASCAL VOC datasets and found it highly effective for synthesizing images in semantic segmentation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in semantic segmentation tasks, how to improve model performance through data augmentation while avoiding the large amount of time and cost required for annotating new datasets. Specifically, traditional data augmentation methods (such as simple transformations like rotation and flipping) cannot generate new images with diversity and high - semantic attributes, and although generative models can generate synthetic images, there are challenges when directly applied, such as difficulty in generating images that accurately reflect the original content and structure. To solve these problems, the author proposes a data augmentation method based on the Controllable Diffusion Model, aiming to generate high - quality synthetic images to supplement the original dataset. This method includes the following key steps: 1. **Text Prompt Construction**: Construct more accurate text prompts by combining the descriptions generated by the image caption generation model (such as BLIP - 2) and the class labels in the image. \[ P^*_i = "P^g_i; P^c_i" \] 2. **Visual Prior Combination**: Combine the visual prior information of the image (such as the result of line art edge detection) with the segmentation map to ensure that the layout of the generated image is clear and the label information is retained. \[ V^*_i = \omega_1 V I_i+\omega_2 V S_i \] 3. **Class Balancing Algorithm**: Ensure that the generated synthetic images are evenly distributed among various classes to prevent over - representation of certain classes. 4. **No Post - filtering**: Directly use the generated images for training, demonstrating the effectiveness of the proposed method, and integrate filters when necessary to verify compatibility. Through these improvements, this method can significantly improve the performance of semantic segmentation models without increasing the cost of manual annotation, especially performing well on small - sample datasets. Experimental results show that after combining the augmented data, the performance of multiple semantic segmentation models (such as DeepLabV3 +, PSPNet, Mask2Former) has been significantly improved. ### Key Formula Summary - Text Prompt Construction Formula: \[ P^*_i = "P^g_i; P^c_i" \] - Visual Prior Combination Formula: \[ V^*_i = \omega_1 V I_i+\omega_2 V S_i \] - Class Balancing Algorithm Output Formula: \[ D_{\text{final}} = D_{\text{gen}} \cup D_{\text{origin}} \] These techniques work together to make the generated synthetic images not only visually close to real images but also more reasonable and accurate in terms of class distribution and semantic information.

Enhanced Generative Data Augmentation for Semantic Segmentation via Stronger Guidance

Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation from Scratch

Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation

Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation

SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

DIAGen: Diverse Image Augmentation with Generative Models

VSGAN: Visual Saliency Guided Generative Adversarial Network for Data Augmentation

Diverse Data Augmentation for Learning Image Segmentation with Cross-Modality Annotations.

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

A Simple Background Augmentation Method for Object Detection with Diffusion Model

3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing

Effective Data Augmentation With Diffusion Models

Smart(Sampling)Augment: Optimal and Efficient Data Augmentation for Semantic Segmentation

Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing

Learning to Augment: Hallucinating Data for Domain Generalized Segmentation

Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation