Abstract:Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Despite the efficacy of Neural Radiance Fields (NeRF) for driving scenes, 3D Gaussian Splatting (3DGS) emerges as a promising direction due to its faster speed and more explicit representation. However, most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements for effective reconstruction, limiting their applications for in-the-wild scenarios. To facilitate efficient 3D scene reconstruction without costly annotations, we propose a self-supervised street Gaussian ($\textit{S}^3$Gaussian) method to decompose dynamic and static elements from 4D consistency. We represent each scene with 3D Gaussians to preserve the explicitness and further accompany them with a spatial-temporal field network to compactly model the 4D dynamics. We conduct extensive experiments on the challenging Waymo-Open dataset to evaluate the effectiveness of our method. Our $\textit{S}^3$Gaussian demonstrates the ability to decompose static and dynamic scenes and achieves the best performance without using 3D annotations. Code is available at: <a class="link-external link-https" href="https://github.com/nnanhuang/S3Gaussian/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper proposes a new method called S3Gaussian (Self-Supervised Street Gaussians) to solve the problem of 3D street scene reconstruction in autonomous driving. Current techniques such as Neural Radiance Fields (NeRF) and 3D Gaussian Scatter (3DGS) suffer from issues like slow processing speed and inability to explicitly represent dynamic elements when dealing with driving scenes. Particularly, most 3DGS methods require tracking 3D vehicle bounding boxes to decompose static and dynamic elements, limiting their practical applications in the real world. S3Gaussian introduces a self-supervised approach that can decompose dynamic and static elements from 4D consistency without requiring costly annotations. It utilizes 3D Gaussians to maintain explicit representation and compactly models the 4D dynamics through a spatiotemporal field network. This approach is achieved by a multi-resolution Hexplane structure encoder and a multi-head Gaussian decoder, effectively handling complex spatiotemporal deformations and separating static and dynamic scenes. The main contributions of the paper include: 1. Introducing the first self-supervised method S3Gaussian that can decompose dynamic and static 3D Gaussians in street scenes without additional manual annotation. 2. Introducing an efficient spatiotemporal decomposition network that automatically captures deformations of 3D Gaussians. 3. Conducting extensive experiments on challenging datasets to demonstrate that S3Gaussian outperforms existing methods in scene reconstruction and novel view synthesis tasks, without relying on 3D annotations. In this way, S3Gaussian enables high-fidelity and real-time neural rendering of dynamic urban street scenes in autonomous driving simulations without 3D supervision. It addresses the limitations of existing methods in terms of training time, rendering speed, and the ability to differentiate between dynamic and static scenes.

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

S^3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations

Scene reconstruction techniques for autonomous driving: a review of 3D Gaussian splatting

GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

3D StreetUnveiler with Semantic-Aware 2DGS

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians