Abstract:World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce DriveDreamer, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, DriveDreamer acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. The proposed DriveDreamer is the first world model established from real-world driving scenarios. We instantiate DriveDreamer on the challenging nuScenes benchmark, and extensive experiments verify that DriveDreamer empowers precise, controllable video generation that faithfully captures the structural constraints of real-world traffic scenarios. Additionally, DriveDreamer enables the generation of realistic and reasonable driving policies, opening avenues for interaction and practical applications.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of world model construction in the field of autonomous driving, specifically how to generate high-quality driving videos and reasonable driving strategies from real driving scenarios. Specifically, the paper proposes a new model named **DriveDreamer**, which has the following main features: 1. **The first world model based on real driving scenarios**: DriveDreamer is the first world model entirely built on real driving scenarios. Compared to existing research that mainly relies on game environments or simulated settings, it can better understand and predict complex real driving environments. 2. **Two-stage training process**: By introducing a two-stage training process, the first stage enables the model to understand structured traffic information, and the second stage equips the model with the ability to predict future states, thereby improving the model's convergence speed and sampling efficiency. 3. **Enhanced understanding of the real world**: To improve the understanding of real driving scenarios, the researchers introduced the Autonomous Driving Diffusion Model (Auto-DM) and a two-stage training process. The first stage training allows the model to understand traffic structure information, while the second stage endows it with predictive capabilities. 4. **Controllable driving video generation**: DriveDreamer can generate driving videos that highly conform to actual traffic constraints based on different input conditions (such as structured traffic information, text prompts, and driving actions). These videos can be adjusted according to different driving strategies. 5. **Generation of future driving strategies**: In addition to generating videos, DriveDreamer can also generate reasonable future driving strategies based on historical observations and Auto-DM features. In summary, DriveDreamer aims to extract latent dynamics from real driving scenarios to build a comprehensive world model, generating high-quality driving videos and reasonable driving strategies, thereby advancing the development of autonomous driving technology.

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

InfinityDrive: Breaking Time Limits in Driving World Models

Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Safedrive Dreamer: Navigating Safety–critical Scenarios in Autonomous Driving with World Models

CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving

DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

World Models for Autonomous Driving: An Initial Survey

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction

Physical Informed Driving World Model

DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

Dream to Drive With Predictive Individual World Model

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving