Abstract:Closed-loop simulation is essential for advancing end-to-end autonomous driving systems. Contemporary sensor simulation methods, such as NeRF and 3DGS, rely predominantly on conditions closely aligned with training data distributions, which are largely confined to forward-driving scenarios. Consequently, these methods face limitations when rendering complex maneuvers (e.g., lane change, acceleration, deceleration). Recent advancements in autonomous-driving world models have demonstrated the potential to generate diverse driving videos. However, these approaches remain constrained to 2D video generation, inherently lacking the spatiotemporal coherence required to capture intricacies of dynamic driving environments. In this paper, we introduce DriveDreamer4D, which enhances 4D driving scene representation leveraging world model priors. Specifically, we utilize the world model as a data machine to synthesize novel trajectory videos based on real-world driving data. Notably, we explicitly leverage structured conditions to control the spatial-temporal consistency of foreground and background elements, thus the generated data adheres closely to traffic constraints. To our knowledge, DriveDreamer4D is the first to utilize video generation models for improving 4D reconstruction in driving scenarios. Experimental results reveal that DriveDreamer4D significantly enhances generation quality under novel trajectory views, achieving a relative improvement in FID by 24.5%, 39.0%, and 10.5% compared to PVG, S3Gaussian, and Deformable-GS. Moreover, DriveDreamer4D markedly enhances the spatiotemporal coherence of driving agents, which is verified by a comprehensive user study and the relative increases of 20.3%, 42.0%, and 13.7% in the NTA-IoU metric.

DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation

Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention

3-D Surround View for Advanced Driver Assistance Systems.

DreamDrive: Generative 4D Scene Modeling from Street View Images

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

DiVE: DiT-based Video Generation with Enhanced Control

Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model

UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control

MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

Driving Scene Synthesis on Free-form Trajectories with Generative Prior

Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

Physical Informed Driving World Model

DrivingGen: Efficient Safety-Critical Driving Video Generation with Latent Diffusion Models

FreeVS: Generative View Synthesis on Free Driving Trajectory

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving