DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

Chensheng Peng,Chengwei Zhang,Yixiao Wang,Chenfeng Xu,Yichen Xie,Wenzhao Zheng,Kurt Keutzer,Masayoshi Tomizuka,Wei Zhan
2024-11-18
Abstract:We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline of dynamic street Gaussians. In the first stage, we extract 2D motion masks based on the observation that 3D Gaussian Splatting inherently can reconstruct only the static regions in dynamic environments. These extracted 2D motion priors are then mapped into the Gaussian space in a differentiable manner, leveraging an efficient formulation of dynamic Gaussians in the second stage. Combined with the introduced geometric regularizations, our method are able to address the over-fitting issues caused by data sparsity in autonomous driving, reconstructing physically plausible Gaussians that align with object surfaces rather than floating in air. Furthermore, we introduce temporal cross-view consistency to ensure coherence across time and viewpoints, resulting in high-quality surface reconstruction. Comprehensive experiments demonstrate the efficiency and effectiveness of DeSiRe-GS, surpassing prior self-supervised arts and achieving accuracy comparable to methods relying on external 3D bounding box annotations. Code is available at \url{<a class="link-external link-https" href="https://github.com/chengweialan/DeSiRe-GS" rel="external noopener nofollow">this https URL</a>}
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform effective static - dynamic decomposition and high - quality surface reconstruction in the autonomous driving scenario. Specifically: 1. **Static - Dynamic Decomposition**: In the autonomous driving scenario, distinguishing between static objects (such as buildings, parked vehicles) and dynamic objects (such as moving vehicles, pedestrians) is crucial for scene understanding. Existing methods have difficulty in effectively performing static - dynamic decomposition without additional 3D annotations. The method proposed in the paper (DeSiRe - GS) extracts motion information in a self - supervised manner, achieving effective static - dynamic decomposition. 2. **High - Quality Surface Reconstruction**: In the autonomous driving scenario, high - quality surface reconstruction is very important for generating realistic images and performing accurate scene understanding. The existing 3D Gaussian Splatting (3DGS) method has an over - fitting problem when dealing with dynamic objects, resulting in inaccurate geometric learning. The method proposed in the paper improves the quality of surface reconstruction by introducing geometric regularization and temporal cross - view consistency. ### Main Contributions 1. **Motion Information Extraction**: Based on a simple observation that 3DGS fails to successfully model dynamic regions, the paper proposes a method to easily extract motion information from appearance differences. 2. **Distilling 2D Motion Priors into the Global Gaussian Space**: By using a time - dependent Gaussian model, the extracted 2D motion priors are distilled into the global Gaussian space in a differentiable manner, thereby correcting the inaccurate properties of each Gaussian. 3. **Introducing Effective 3D Regularization and Temporal Cross - View Consistency**: By introducing geometric regularization and temporal cross - view consistency, physically reasonable Gaussian ellipsoids are generated, further improving the quality of decomposition and reconstruction. ### Method Overview 1. **Dynamic Mask Extraction (Stage 1)**: - Use a pre - trained base model to extract features of the rendered image and the ground - truth image. - Calculate the feature difference per pixel to generate a dynamic mask. - Predict the dynamic property through a Multi - Layer Perceptron (MLP) decoder and generate a binary mask. 2. **Static - Dynamic Decomposition (Stage 2)**: - Utilize the dynamic mask extracted in stage 1 to regularize the 2D velocity map. - Distinguish dynamic and static Gaussians by a simple threshold to achieve self - supervised decomposition. 3. **Surface Reconstruction**: - **Geometric Regularization**: Flatten the 3D Gaussian ellipsoid into a 2D disk by minimizing the scale on the shortest axis, making it better fit the object surface. - **Normal Derivation**: Derive the normal vector directly from the scale vector instead of attaching a separate normal vector. - **Giant Gaussian Regularization**: Introduce a penalty term to prevent the generation of overly large Gaussian ellipsoids. - **Temporal - Spatial Consistency**: Use temporal cross - view information to enhance geometric consistency and reduce the over - fitting problem. ### Experimental Results - **Quantitative Results**: On the Waymo Open Dataset and the KITTI dataset, DeSiRe - GS has achieved state - of - the - art performance in both image reconstruction and novel view synthesis tasks. - **Qualitative Analysis**: Visualization results show that DeSiRe - GS is significantly superior to other methods in static - dynamic decomposition and depth prediction. - **Ablation Study**: The effectiveness of each component has been verified through ablation experiments, especially the improvement of performance by the motion mask and normal supervision. In conclusion, DeSiRe - GS achieves effective static - dynamic decomposition and high - quality surface reconstruction in the autonomous driving scenario in a self - supervised manner without additional 3D annotations.