Abstract:We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline of dynamic street Gaussians. In the first stage, we extract 2D motion masks based on the observation that 3D Gaussian Splatting inherently can reconstruct only the static regions in dynamic environments. These extracted 2D motion priors are then mapped into the Gaussian space in a differentiable manner, leveraging an efficient formulation of dynamic Gaussians in the second stage. Combined with the introduced geometric regularizations, our method are able to address the over-fitting issues caused by data sparsity in autonomous driving, reconstructing physically plausible Gaussians that align with object surfaces rather than floating in air. Furthermore, we introduce temporal cross-view consistency to ensure coherence across time and viewpoints, resulting in high-quality surface reconstruction. Comprehensive experiments demonstrate the efficiency and effectiveness of DeSiRe-GS, surpassing prior self-supervised arts and achieving accuracy comparable to methods relying on external 3D bounding box annotations. Code is available at \url{<a class="link-external link-https" href="https://github.com/chengweialan/DeSiRe-GS" rel="external noopener nofollow">this https URL</a>}

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to perform effective static - dynamic decomposition and high - quality surface reconstruction in the autonomous driving scenario. Specifically: 1. **Static - Dynamic Decomposition**: In the autonomous driving scenario, distinguishing between static objects (such as buildings, parked vehicles) and dynamic objects (such as moving vehicles, pedestrians) is crucial for scene understanding. Existing methods have difficulty in effectively performing static - dynamic decomposition without additional 3D annotations. The method proposed in the paper (DeSiRe - GS) extracts motion information in a self - supervised manner, achieving effective static - dynamic decomposition. 2. **High - Quality Surface Reconstruction**: In the autonomous driving scenario, high - quality surface reconstruction is very important for generating realistic images and performing accurate scene understanding. The existing 3D Gaussian Splatting (3DGS) method has an over - fitting problem when dealing with dynamic objects, resulting in inaccurate geometric learning. The method proposed in the paper improves the quality of surface reconstruction by introducing geometric regularization and temporal cross - view consistency. ### Main Contributions 1. **Motion Information Extraction**: Based on a simple observation that 3DGS fails to successfully model dynamic regions, the paper proposes a method to easily extract motion information from appearance differences. 2. **Distilling 2D Motion Priors into the Global Gaussian Space**: By using a time - dependent Gaussian model, the extracted 2D motion priors are distilled into the global Gaussian space in a differentiable manner, thereby correcting the inaccurate properties of each Gaussian. 3. **Introducing Effective 3D Regularization and Temporal Cross - View Consistency**: By introducing geometric regularization and temporal cross - view consistency, physically reasonable Gaussian ellipsoids are generated, further improving the quality of decomposition and reconstruction. ### Method Overview 1. **Dynamic Mask Extraction (Stage 1)**: - Use a pre - trained base model to extract features of the rendered image and the ground - truth image. - Calculate the feature difference per pixel to generate a dynamic mask. - Predict the dynamic property through a Multi - Layer Perceptron (MLP) decoder and generate a binary mask. 2. **Static - Dynamic Decomposition (Stage 2)**: - Utilize the dynamic mask extracted in stage 1 to regularize the 2D velocity map. - Distinguish dynamic and static Gaussians by a simple threshold to achieve self - supervised decomposition. 3. **Surface Reconstruction**: - **Geometric Regularization**: Flatten the 3D Gaussian ellipsoid into a 2D disk by minimizing the scale on the shortest axis, making it better fit the object surface. - **Normal Derivation**: Derive the normal vector directly from the scale vector instead of attaching a separate normal vector. - **Giant Gaussian Regularization**: Introduce a penalty term to prevent the generation of overly large Gaussian ellipsoids. - **Temporal - Spatial Consistency**: Use temporal cross - view information to enhance geometric consistency and reduce the over - fitting problem. ### Experimental Results - **Quantitative Results**: On the Waymo Open Dataset and the KITTI dataset, DeSiRe - GS has achieved state - of - the - art performance in both image reconstruction and novel view synthesis tasks. - **Qualitative Analysis**: Visualization results show that DeSiRe - GS is significantly superior to other methods in static - dynamic decomposition and depth prediction. - **Ablation Study**: The effectiveness of each component has been verified through ablation experiments, especially the improvement of performance by the motion mask and normal supervision. In conclusion, DeSiRe - GS achieves effective static - dynamic decomposition and high - quality surface reconstruction in the autonomous driving scenario in a self - supervised manner without additional 3D annotations.

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

S^3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

MD-Surf: Multimodal Neural Surface Reconstruction from Driving Views

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

Urban4D: Semantic-Guided 4D Gaussian Splatting for Urban Scene Reconstruction

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting

Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

3D StreetUnveiler with Semantic-Aware 2DGS

LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting

MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting

LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction

GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction

CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAM

G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs