MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

Ruiyuan Gao,Kai Chen,Zhihao Li,Lanqing Hong,Zhenguo Li,Qiang Xu
2024-10-12
Abstract:While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This innovative approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address the minor errors in generated content, we propose deformable Gaussian splatting with monocular depth initialization and appearance modeling to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. Our results demonstrate the framework's superior performance, showcasing its transformative potential for autonomous driving simulation and beyond.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to generate high - quality controllable 3D street scenes in open scenarios. Specifically, existing 3D generation models are under - developed when dealing with unbounded scenarios such as autonomous driving due to the high cost of data acquisition. This paper introduces a new framework named MagicDrive3D, which supports multi - conditional control (including BEV maps, 3D objects, and text descriptions), can generate high - quality 3D street scenes, and supports arbitrary - view rendering. MagicDrive3D achieves easy - to - control generation and static - scene acquisition by first training a video - generation model and then reconstructing from the generated data, thereby improving the quality of scene reconstruction. In addition, in order to deal with small errors in the generated content, the authors propose a deformable Gaussian - spline method with monocular - depth initialization and appearance modeling to manage exposure differences at different views. The verification results of this framework on the nuScenes dataset show that MagicDrive3D can generate diverse and high - quality 3D driving scenes, support downstream tasks such as BEV segmentation, demonstrating its revolutionary potential in the field of autonomous - driving simulation.