SynDiff-AD: Improving Semantic Segmentation and End-to-End Autonomous Driving with Synthetic Data from Latent Diffusion Models

Harsh Goel,Sai Shankar Narasimhan,Oguzhan Akcin,Sandeep Chinchali
2024-11-25
Abstract:In recent years, significant progress has been made in collecting large-scale datasets to improve segmentation and autonomous driving models. These large-scale datasets are often dominated by common environmental conditions such as "Clear and Day" weather, leading to decreased performance in under-represented conditions like "Rainy and Night". To address this issue, we introduce SynDiff-AD, a novel data augmentation pipeline that leverages diffusion models (DMs) to generate realistic images for such subgroups. SynDiff-AD uses ControlNet-a DM that guides data generation conditioned on semantic maps-along with a novel prompting scheme that generates subgroup-specific, semantically dense prompts. By augmenting datasets with SynDiff-AD, we improve the performance of segmentation models like Mask2Former and SegFormer by up to 1.2% and 2.3% on the Waymo dataset, and up to 1.4% and 0.7% on the DeepDrive dataset, respectively. Additionally, we demonstrate that our SynDiff-AD pipeline enhances the driving performance of end-to-end autonomous driving models, like AIM-2D and AIM-BEV, by up to 20% across diverse environmental conditions in the CARLA autonomous driving simulator, providing a more robust model.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of dataset imbalance in autonomous driving (AD) and semantic segmentation tasks. Specifically, most of the existing large - scale datasets are concentrated on common environmental conditions, such as sunny days and daytime, which leads to a significant decline in the performance of models under rare conditions (such as rain and night). To solve this problem, the author introduced a new data augmentation pipeline - **SynDiff - AD**, which uses diffusion models (DMs) to generate realistic images to increase the amount of data under rare conditions. #### Main problem description: 1. **Dataset imbalance**: - Existing large - scale datasets (such as Waymo and DeepDrive) mainly contain data under common environmental conditions, for example, "sunny days" and "daytime", while the data of rare conditions (such as "rainy nights") is very scarce. - This imbalance results in good performance of models under common conditions but poor performance under rare conditions. For example, the performance of Mask2Former under "rainy nights" is 40% lower than that under "sunny days". 2. **High cost of manual annotation**: - Collecting and annotating more data under rare conditions is both expensive and time - consuming, especially for complex tasks such as semantic segmentation and end - to - end autonomous driving (E2E AD). 3. **Limitations of existing methods**: - Although generating synthetic data using high - fidelity 3D simulation engines (such as Unity and Unreal Engine) can generate diverse conditions, it has a high computational cost and requires additional expert driving plans. #### Solution: The author proposed **SynDiff - AD**, a data augmentation pipeline based on latent diffusion models (LDMs), which solves the problem in the following ways: 1. **Generate synthetic data**: - Using ControlNet (a diffusion model) and a novel prompt generation scheme, convert images under common conditions into images under rare conditions while maintaining the semantic consistency of the images. - The generated synthetic data can be directly used for training without additional manual annotation. 2. **Improve model performance**: - The experimental results show that after training with the synthetic data generated by SynDiff - AD, the performance of semantic segmentation models (such as Mask2Former and SegFormer) and end - to - end autonomous driving models (such as AIM - 2D and AIM - BEV) under rare conditions has been significantly improved. 3. **Efficient and economical**: - SynDiff - AD provides an efficient and economical solution. It can generate high - quality synthetic data without relying on expensive simulators or manual annotation, thus balancing the data distribution and improving the robustness of the model. ### Summary The core problem of this paper is to solve the problem of model performance degradation caused by dataset imbalance in autonomous driving and semantic segmentation tasks. By introducing SynDiff - AD, the author proposed a data augmentation method based on diffusion models, which can generate realistic synthetic data and effectively improve the performance of models under rare conditions.