DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

Xiaofeng Wang,Zheng Zhu,Guan Huang,Xinze Chen,Jiagang Zhu,Jiwen Lu
2023-11-27
Abstract:World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce DriveDreamer, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, DriveDreamer acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. The proposed DriveDreamer is the first world model established from real-world driving scenarios. We instantiate DriveDreamer on the challenging nuScenes benchmark, and extensive experiments verify that DriveDreamer empowers precise, controllable video generation that faithfully captures the structural constraints of real-world traffic scenarios. Additionally, DriveDreamer enables the generation of realistic and reasonable driving policies, opening avenues for interaction and practical applications.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of world model construction in the field of autonomous driving, specifically how to generate high-quality driving videos and reasonable driving strategies from real driving scenarios. Specifically, the paper proposes a new model named **DriveDreamer**, which has the following main features: 1. **The first world model based on real driving scenarios**: DriveDreamer is the first world model entirely built on real driving scenarios. Compared to existing research that mainly relies on game environments or simulated settings, it can better understand and predict complex real driving environments. 2. **Two-stage training process**: By introducing a two-stage training process, the first stage enables the model to understand structured traffic information, and the second stage equips the model with the ability to predict future states, thereby improving the model's convergence speed and sampling efficiency. 3. **Enhanced understanding of the real world**: To improve the understanding of real driving scenarios, the researchers introduced the Autonomous Driving Diffusion Model (Auto-DM) and a two-stage training process. The first stage training allows the model to understand traffic structure information, while the second stage endows it with predictive capabilities. 4. **Controllable driving video generation**: DriveDreamer can generate driving videos that highly conform to actual traffic constraints based on different input conditions (such as structured traffic information, text prompts, and driving actions). These videos can be adjusted according to different driving strategies. 5. **Generation of future driving strategies**: In addition to generating videos, DriveDreamer can also generate reasonable future driving strategies based on historical observations and Auto-DM features. In summary, DriveDreamer aims to extract latent dynamics from real driving scenarios to build a comprehensive world model, generating high-quality driving videos and reasonable driving strategies, thereby advancing the development of autonomous driving technology.