Abstract:Photorealistic 4D reconstruction of street scenes is essential for developing real-world simulators in autonomous driving. However, most existing methods perform this task offline and rely on time-consuming iterative processes, limiting their practical applications. To this end, we introduce the Large 4D Gaussian Reconstruction Model (DrivingRecon), a generalizable driving scene reconstruction model, which directly predicts 4D Gaussian from surround view videos. To better integrate the surround-view images, the Prune and Dilate Block (PD-Block) is proposed to eliminate overlapping Gaussian points between adjacent views and remove redundant background points. To enhance cross-temporal information, dynamic and static decoupling is tailored to better learn geometry and motion features. Experimental results demonstrate that DrivingRecon significantly improves scene reconstruction quality and novel view synthesis compared to existing methods. Furthermore, we explore applications of DrivingRecon in model pre-training, vehicle adaptation, and scene editing. Our code is available at <a class="link-external link-https" href="https://github.com/EnVision-Research/DriveRecon" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the field of autonomous driving, the existing 4D street - view reconstruction methods usually require offline processing and rely on time - consuming iterative processes, which limit their use in practical applications. To overcome these problems, the author proposes a large - scale 4D Gaussian reconstruction model named DrivingRecon, aiming to directly predict 4D scenes from surround - view videos. Specifically, this model addresses the following key issues: 1. **Removal of Redundant Gaussian Points**: - Between different views, the model is prone to predict duplicate Gaussian points, leading to a decline in model performance. To solve this problem, the author proposes the Prune and Dilate Block (PD - Block), which is used to prune overlapping Gaussian points and redundant background points between adjacent views. 2. **Decoupling of Dynamic and Static Objects**: - The views in the autonomous driving scene are very sparse, so cross - time supervision is crucial. For dynamic objects, the model not only predicts the Gaussian points at the current moment but also predicts the optical flow of each Gaussian point, so that the next frame can be used to supervise the predicted Gaussian points. For static objects, the scene can be rendered under the camera parameters of adjacent timestamps and only the static part can be supervised. 3. **Efficient Temporal and Multi - view Information Fusion**: - By introducing 3D - aware Position Encoding and Temporal Cross Attention mechanism, the model can better fuse information from different views and times, improving the learning effect of geometric and motion features. 4. **High - Quality Novel View Synthesis**: - DrivingRecon can synthesize new views given specific camera parameters, ensuring adaptability to different vehicle models, and can perform scene editing by editing the specified 4D scene (such as removing, inserting, and manipulating objects). ### Formula Summary 1. **Position Encoding Formula**: \[ [x, y, z]=R\times E^{- 1}\times d_{u,v}\times [u, v, 1]+V \] where \(d_{u,v}\) is the predicted depth value, \(R\) and \(E\) are the extrinsic matrix of the camera respectively, and \(V\) is the translation vector. 2. **Final Depth Calculation**: \[ d_f=\sum_{l = 1}^{L}l\times\text{softmax}(d_c)+d_r \] where \(d_c\) is the depth classification and \(d_r\) is the depth regression correction. 3. **Loss Function**: \[ L_{\text{total}}=\lambda_{\text{re}}L_{\text{re}}+\lambda_cL_c+\lambda_rL_r+\lambda_{\text{PE}}L_{\text{PE}}+\lambda_{\text{dr}}L_{\text{dr}}+\lambda_{\text{sr}}L_{\text{sr}}+\lambda_{\text{seg}}L_{\text{seg}} \] where the weight coefficients of each loss term are \(\lambda_{\text{re}}, \lambda_c, \lambda_r, \lambda_{\text{PE}}, \lambda_{\text{dr}}, \lambda_{\text{sr}}, \lambda_{\text{seg}}\) respectively. Through these improvements, DrivingRecon significantly improves the scene reconstruction quality and the effect of novel view synthesis, while demonstrating its potential in pre - training, vehicle adaptation and scene - editing tasks.

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving

3-D Surround View for Advanced Driver Assistance Systems.

Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations.

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations

S^3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

OmniRe: Omni Urban Scene Reconstruction

Visualization pipeline of autonomous driving scenes based on FCCR-3D reconstruction

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes

Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

HarmonicNeRF: Geometry-Informed Synthetic View Augmentation for 3D Scene Reconstruction in Driving Scenarios

VDG: Vision-Only Dynamic Gaussian for Driving Simulation

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving