Abstract:In recent years, significant progress has been made in collecting large-scale datasets to improve segmentation and autonomous driving models. These large-scale datasets are often dominated by common environmental conditions such as "Clear and Day" weather, leading to decreased performance in under-represented conditions like "Rainy and Night". To address this issue, we introduce SynDiff-AD, a novel data augmentation pipeline that leverages diffusion models (DMs) to generate realistic images for such subgroups. SynDiff-AD uses ControlNet-a DM that guides data generation conditioned on semantic maps-along with a novel prompting scheme that generates subgroup-specific, semantically dense prompts. By augmenting datasets with SynDiff-AD, we improve the performance of segmentation models like Mask2Former and SegFormer by up to 1.2% and 2.3% on the Waymo dataset, and up to 1.4% and 0.7% on the DeepDrive dataset, respectively. Additionally, we demonstrate that our SynDiff-AD pipeline enhances the driving performance of end-to-end autonomous driving models, like AIM-2D and AIM-BEV, by up to 20% across diverse environmental conditions in the CARLA autonomous driving simulator, providing a more robust model.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of dataset imbalance in autonomous driving (AD) and semantic segmentation tasks. Specifically, most of the existing large - scale datasets are concentrated on common environmental conditions, such as sunny days and daytime, which leads to a significant decline in the performance of models under rare conditions (such as rain and night). To solve this problem, the author introduced a new data augmentation pipeline - **SynDiff - AD**, which uses diffusion models (DMs) to generate realistic images to increase the amount of data under rare conditions. #### Main problem description: 1. **Dataset imbalance**: - Existing large - scale datasets (such as Waymo and DeepDrive) mainly contain data under common environmental conditions, for example, "sunny days" and "daytime", while the data of rare conditions (such as "rainy nights") is very scarce. - This imbalance results in good performance of models under common conditions but poor performance under rare conditions. For example, the performance of Mask2Former under "rainy nights" is 40% lower than that under "sunny days". 2. **High cost of manual annotation**: - Collecting and annotating more data under rare conditions is both expensive and time - consuming, especially for complex tasks such as semantic segmentation and end - to - end autonomous driving (E2E AD). 3. **Limitations of existing methods**: - Although generating synthetic data using high - fidelity 3D simulation engines (such as Unity and Unreal Engine) can generate diverse conditions, it has a high computational cost and requires additional expert driving plans. #### Solution: The author proposed **SynDiff - AD**, a data augmentation pipeline based on latent diffusion models (LDMs), which solves the problem in the following ways: 1. **Generate synthetic data**: - Using ControlNet (a diffusion model) and a novel prompt generation scheme, convert images under common conditions into images under rare conditions while maintaining the semantic consistency of the images. - The generated synthetic data can be directly used for training without additional manual annotation. 2. **Improve model performance**: - The experimental results show that after training with the synthetic data generated by SynDiff - AD, the performance of semantic segmentation models (such as Mask2Former and SegFormer) and end - to - end autonomous driving models (such as AIM - 2D and AIM - BEV) under rare conditions has been significantly improved. 3. **Efficient and economical**: - SynDiff - AD provides an efficient and economical solution. It can generate high - quality synthetic data without relying on expensive simulators or manual annotation, thus balancing the data distribution and improving the robustness of the model. ### Summary The core problem of this paper is to solve the problem of model performance degradation caused by dataset imbalance in autonomous driving and semantic segmentation tasks. By introducing SynDiff - AD, the author proposed a data augmentation method based on diffusion models, which can generate realistic synthetic data and effectively improve the performance of models under rare conditions.

SynDiff-AD: Improving Semantic Segmentation and End-to-End Autonomous Driving with Synthetic Data from Latent Diffusion Models

Improving Synthetic to Realistic Semantic Segmentation with Parallel Generative Ensembles for Autonomous Urban Driving

DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

AdaptDiff: Cross-Modality Domain Adaptation via Weak Conditional Semantic Diffusion for Retinal Vessel Segmentation

GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

Efficient Domain Augmentation for Autonomous Driving Testing Using Diffusion Models

AdvDiffuser: Generating Adversarial Safety-Critical Driving Scenarios via Guided Diffusion

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

A-BDD: Leveraging Data Augmentations for Safe Autonomous Driving in Adverse Weather and Lighting

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer

DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

SAM4UDASS: When SAM Meets Unsupervised Domain Adaptive Semantic Segmentation in Intelligent Vehicles

SimGen: Simulator-conditioned Driving Scene Generation

Cross-Dataset Collaborative Learning for Semantic Segmentation in Autonomous Driving