Driving Scene Synthesis on Free-form Trajectories with Generative Prior

Zeyu Yang,Zijie Pan,Yuankun Yang,Xiatian Zhu,Li Zhang
2024-12-03
Abstract:Driving scene synthesis along free-form trajectories is essential for driving simulations to enable closed-loop evaluation of end-to-end driving policies. While existing methods excel at novel view synthesis on recorded trajectories, they face challenges with novel trajectories due to limited views of driving videos and the vastness of driving environments. To tackle this challenge, we propose a novel free-form driving view synthesis approach, dubbed DriveX, by leveraging video generative prior to optimize a 3D model across a variety of trajectories. Concretely, we crafted an inverse problem that enables a video diffusion model to be utilized as a prior for many-trajectory optimization of a parametric 3D model (e.g., Gaussian splatting). To seamlessly use the generative prior, we iteratively conduct this process during optimization. Our resulting model can produce high-fidelity virtual driving environments outside the recorded trajectory, enabling free-form trajectory driving simulation. Beyond real driving scenes, DriveX can also be utilized to simulate virtual driving worlds from AI-generated videos.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in autonomous driving simulation, how to generate high - quality free - trajectory driving scenarios. Specifically, existing methods face challenges when dealing with novel views outside of recorded trajectories because these methods usually rely on driving videos with limited perspectives and large and complex driving environments. This results in the poor performance of existing methods in generating new views far from the recorded trajectories, limiting their flexibility and practicality. To solve these problems, the authors propose a new framework named DriveX, which optimizes 3D models (such as Gaussian point clouds) by leveraging video generative prior to achieve the synthesis of free - trajectory driving scenarios. Specifically, DriveX enables the video diffusion model to be used as a prior for multi - trajectory optimization by designing an inverse problem. This process includes: 1. **Inverse problem design**: Compare the rendered novel views with the recorded images, identify potential artifact areas, and use these reliable areas as conditional inputs to restore the rendered views. 2. **Iterative optimization**: During the training process, DriveX iteratively uses the video diffusion model to generate improved views and feeds them back as a supervisory signal into the optimization of the 3D model. 3. **Application of generated videos**: Besides real - world driving videos, DriveX can also be applied to virtual driving videos generated by AI and can maintain high synthesis quality even in the face of inherent content inconsistencies. Through this method, DriveX can generate high - quality virtual driving environments on free - trajectories outside of the recorded trajectories, thereby significantly improving the quality and flexibility of driving scenario synthesis. ### Main contributions 1. Propose to use video generative prior (with rich spatio - temporal knowledge) for general driving scenario synthesis from single - trajectory recorded videos. 2. Introduce the DriveX framework and innovatively construct an inverse problem, enabling the video diffusion model to be used as a prior for multi - trajectory optimization. 3. Experiments show that DriveX significantly outperforms the existing state - of - the - art methods in driving scenario synthesis, especially when dealing with single - trajectory recorded videos. In addition, DriveX can also render novel trajectories in the virtual driving world generated by AI and can show superiority even in the face of inherent content inconsistencies. ### Formula summary - **Rendering formula**: \[ I' = R(G, P_i)=\sum_{k = 1}^{N}o_kG_kc_k(P_i)\prod_{j = 1}^{k - 1}(1 - o_jG_j) \] where \(R\) represents a differentiable Gaussian rasterizer, \(I'\) represents the rendered image, and \(c_k\) is the color of the \(k\)-th Gaussian voxel. - **Loss function**: \[ L_{\text{img}}(I'_i, I_i)=\lambda\left\|I'_i - I_i\right\|_1+(1 - \lambda)L_{\text{SSIM}}(I'_i, I_i) \] where \(L_{\text{SSIM}}\) is the structural similarity (SSIM) loss and \(\lambda\) is the weight. - **Inverse problem formula**: \[ V' = f(V)+\epsilon \] where \(\epsilon\) is random noise and \(f\) is the measurement function. - **Unreliable region mask**: \[ M = 1(\text{SSIM}(I_{\text{ren}}, \hat{I}_{\text{ren}})<\tau) \] These formulas show the key steps and technical details of DriveX in handling the synthesis of free - trajectory driving scenarios.