MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

Ruiyuan Gao,Kai Chen,Zhihao Li,Lanqing Hong,Zhenguo Li,Qiang Xu

2024-10-12

Abstract:While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This innovative approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address the minor errors in generated content, we propose deformable Gaussian splatting with monocular depth initialization and appearance modeling to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. Our results demonstrate the framework's superior performance, showcasing its transformative potential for autonomous driving simulation and beyond.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate high - quality controllable 3D street scenes in open scenarios. Specifically, existing 3D generation models are under - developed when dealing with unbounded scenarios such as autonomous driving due to the high cost of data acquisition. This paper introduces a new framework named MagicDrive3D, which supports multi - conditional control (including BEV maps, 3D objects, and text descriptions), can generate high - quality 3D street scenes, and supports arbitrary - view rendering. MagicDrive3D achieves easy - to - control generation and static - scene acquisition by first training a video - generation model and then reconstructing from the generated data, thereby improving the quality of scene reconstruction. In addition, in order to deal with small errors in the generated content, the authors propose a deformable Gaussian - spline method with monocular - depth initialization and appearance modeling to manage exposure differences at different views. The verification results of this framework on the nuScenes dataset show that MagicDrive3D can generate diverse and high - quality 3D driving scenes, support downstream tasks such as BEV segmentation, demonstrating its revolutionary potential in the field of autonomous - driving simulation.

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

MagicDrive: Street View Generation with Diverse 3D Geometry Control

DreamDrive: Generative 4D Scene Modeling from Street View Images

3-D Surround View for Advanced Driver Assistance Systems.

MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models

DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation

HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving

S^3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model

GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes