Abstract:Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact of scaling up the quantity of generative data on the performance of downstream perception models and find that enhancing data diversity plays a crucial role in effectively scaling generative data production. Therefore, we have developed a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data. Extensive evaluations confirm SubjectDrive's efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to generate large - scale and diverse labeled data for autonomous driving applications through the generative model to improve the performance of downstream perception tasks (such as 3D object detection and tracking). Specifically, the paper focuses on how to effectively expand the quantity of generated data and ensure the quality and diversity of these generated data, thereby significantly improving the performance of perception models trained on the generated data. ### Background and Problem Description of the Paper 1. **The Need for Large - Scale Labeled Data in Autonomous Driving** - The progress of autonomous driving depends on large - scale labeled data sets. - Acquiring and labeling real - world data is both expensive and time - consuming, and there are also issues regarding data privacy and usage rights. - Therefore, exploring the use of generative models to create a large amount of freely labeled data has become an important research direction. 2. **Limitations of Existing Methods** - Although existing generative models can generate high - quality driving - scene videos, their effectiveness in expanding the amount of generated data is limited. - For example, methods such as Panacea fail to significantly improve the performance of downstream perception tasks when generating a large amount of data, mainly due to the lack of diversity in the generated data. 3. **The Importance of Introducing the Topic Control Mechanism** - In order to overcome the limitations of existing methods, the paper proposes a new generative framework - SubjectDrive, which enhances the diversity of generated data by introducing a topic control mechanism. - The topic control mechanism allows the generative model to utilize diverse elements in external data sources, thereby generating more diverse and useful samples. ### Core Contributions of the Paper - **Proposing the SubjectDrive Framework**: This framework significantly improves the diversity and quality of generated data by introducing a topic control mechanism. - **Verifying the Effectiveness of Generated Data Expansion**: Through experiments, SubjectDrive can not only effectively improve the performance of downstream perception tasks when the amount of generated data increases, but also outperform pre - trained models on large - scale real - data sets. - **Innovative Technical Modules**: Including the Topic Prompt Adapter (SPA), the Topic Visual Adapter (SVA), and the Enhanced Temporal Attention (ATA). These modules work together to make the generated videos perform well in spatio - temporal consistency. ### Formula Representation The formulas involved in the paper are represented in Markdown format as follows: 1. **Generation Process of the Diffusion Model** \[ p_\theta(x_{t - 1}\mid x_t)=\mathcal{N}(x_{t - 1};\mu_\theta(x_t,t),\Sigma_\theta(x_t,t)) \] \[ x_t = \sqrt{\bar{\alpha}_t}x_0+\sqrt{1-\bar{\alpha}_t}\epsilon,\quad\epsilon\sim\mathcal{N}(0,I),\quad x_0\sim p(x) \] \[ \min_\theta\mathbb{E}_{t,x,\epsilon}\|\epsilon - \epsilon_\theta(x_t,t)\|^2 \] 2. **Enhanced Text Embedding of the Topic Prompt Adapter (SPA)** \[ \hat{z}_t^i=\text{MLP}([z_t^i + z_{\text{id}}^i,z_v^i]),\quad i\in\{1,2,\dots,M\} \] 3. **Position - Enhanced Subject Embedding of the Topic Visual Adapter (SVA)** \[ f_v^l=\text{MLP}([f_v,\text{Fourier}(l)]) \] \[ z = z+\tanh(\gamma)\cdot T_S(\text{SelfAttn}([z,f_v^l]))

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

Driving Behaviour Style Study with a Hybrid Deep Learning Framework Based on GPS Data

GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

GAD-Generative Learning for HD Map-Free Autonomous Driving

Preliminary Investigation into Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving

SurrealDriver: Designing LLM-powered Generative Driver Agent Framework based on Human Drivers' Driving-thinking Data

SimGen: Simulator-conditioned Driving Scene Generation

Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

Exploring Generative AI for Sim2Real in Driving Data Synthesis

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-tuning

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

Improving Synthetic to Realistic Semantic Segmentation with Parallel Generative Ensembles for Autonomous Urban Driving

GenAD: Generalized Predictive Model for Autonomous Driving

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Scalability in Perception for Autonomous Driving: Waymo Open Dataset.

Driving Scene Synthesis on Free-form Trajectories with Generative Prior

Solving Motion Planning Tasks with a Scalable Generative Model

X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios

DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation