Abstract:Driving scene synthesis along free-form trajectories is essential for driving simulations to enable closed-loop evaluation of end-to-end driving policies. While existing methods excel at novel view synthesis on recorded trajectories, they face challenges with novel trajectories due to limited views of driving videos and the vastness of driving environments. To tackle this challenge, we propose a novel free-form driving view synthesis approach, dubbed DriveX, by leveraging video generative prior to optimize a 3D model across a variety of trajectories. Concretely, we crafted an inverse problem that enables a video diffusion model to be utilized as a prior for many-trajectory optimization of a parametric 3D model (e.g., Gaussian splatting). To seamlessly use the generative prior, we iteratively conduct this process during optimization. Our resulting model can produce high-fidelity virtual driving environments outside the recorded trajectory, enabling free-form trajectory driving simulation. Beyond real driving scenes, DriveX can also be utilized to simulate virtual driving worlds from AI-generated videos.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in autonomous driving simulation, how to generate high - quality free - trajectory driving scenarios. Specifically, existing methods face challenges when dealing with novel views outside of recorded trajectories because these methods usually rely on driving videos with limited perspectives and large and complex driving environments. This results in the poor performance of existing methods in generating new views far from the recorded trajectories, limiting their flexibility and practicality. To solve these problems, the authors propose a new framework named DriveX, which optimizes 3D models (such as Gaussian point clouds) by leveraging video generative prior to achieve the synthesis of free - trajectory driving scenarios. Specifically, DriveX enables the video diffusion model to be used as a prior for multi - trajectory optimization by designing an inverse problem. This process includes: 1. **Inverse problem design**: Compare the rendered novel views with the recorded images, identify potential artifact areas, and use these reliable areas as conditional inputs to restore the rendered views. 2. **Iterative optimization**: During the training process, DriveX iteratively uses the video diffusion model to generate improved views and feeds them back as a supervisory signal into the optimization of the 3D model. 3. **Application of generated videos**: Besides real - world driving videos, DriveX can also be applied to virtual driving videos generated by AI and can maintain high synthesis quality even in the face of inherent content inconsistencies. Through this method, DriveX can generate high - quality virtual driving environments on free - trajectories outside of the recorded trajectories, thereby significantly improving the quality and flexibility of driving scenario synthesis. ### Main contributions 1. Propose to use video generative prior (with rich spatio - temporal knowledge) for general driving scenario synthesis from single - trajectory recorded videos. 2. Introduce the DriveX framework and innovatively construct an inverse problem, enabling the video diffusion model to be used as a prior for multi - trajectory optimization. 3. Experiments show that DriveX significantly outperforms the existing state - of - the - art methods in driving scenario synthesis, especially when dealing with single - trajectory recorded videos. In addition, DriveX can also render novel trajectories in the virtual driving world generated by AI and can show superiority even in the face of inherent content inconsistencies. ### Formula summary - **Rendering formula**: \[ I' = R(G, P_i)=\sum_{k = 1}^{N}o_kG_kc_k(P_i)\prod_{j = 1}^{k - 1}(1 - o_jG_j) \] where \(R\) represents a differentiable Gaussian rasterizer, \(I'\) represents the rendered image, and \(c_k\) is the color of the \(k\)-th Gaussian voxel. - **Loss function**: \[ L_{\text{img}}(I'_i, I_i)=\lambda\left\|I'_i - I_i\right\|_1+(1 - \lambda)L_{\text{SSIM}}(I'_i, I_i) \] where \(L_{\text{SSIM}}\) is the structural similarity (SSIM) loss and \(\lambda\) is the weight. - **Inverse problem formula**: \[ V' = f(V)+\epsilon \] where \(\epsilon\) is random noise and \(f\) is the measurement function. - **Unreliable region mask**: \[ M = 1(\text{SSIM}(I_{\text{ren}}, \hat{I}_{\text{ren}})<\tau) \] These formulas show the key steps and technical details of DriveX in handling the synthesis of free - trajectory driving scenarios.

Driving Scene Synthesis on Free-form Trajectories with Generative Prior

FreeVS: Generative View Synthesis on Free Driving Trajectory

Domain Generalization for Vision-based Driving Trajectory Generation

DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation

X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models

SimGen: Simulator-conditioned Driving Scene Generation

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

Exploring Generative AI for Sim2Real in Driving Data Synthesis

TrajGen: Generating Realistic and Diverse Trajectories With Reactive and Feasible Agent Behaviors for Autonomous Driving

Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation