HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

Hongyu Zhou,Jiahao Shao,Lu Xu,Dongfeng Bai,Weichao Qiu,Bingbing Liu,Yue Wang,Andreas Geiger,Yiyi Liao

2024-03-19

Abstract:Holistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manually annotated 3D bounding boxes. In this paper, we introduce a novel pipeline that utilizes 3D Gaussian Splatting for holistic urban scene understanding. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians, where moving object poses are regularized via physical constraints. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy, and reconstruct dynamic scenes, even in scenarios where 3D bounding box detection are highly noisy. Experimental results on KITTI, KITTI-360, and Virtual KITTI 2 demonstrate the effectiveness of our approach.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the problem of 3D spatial understanding in urban scenarios, particularly focusing on the understanding and reconstruction of dynamic scenes. Specifically, the research team proposes a new method called HUGS (Holistic Urban 3D Scene Understanding via Gaussian Splatting), which utilizes 3D Gaussian distributions for comprehensive understanding of urban environments. The main objectives of HUGS include: 1. **Achieving comprehensive understanding of urban scenes based on RGB images**: This includes tasks such as synthesizing new images from multiple viewpoints, parsing semantic labels, and tracking moving objects. 2. **Completing the above tasks without additional input**: Existing methods often require LiDAR scans or manually annotated 3D bounding boxes as extra input, whereas HUGS aims to achieve these functionalities using only RGB images. 3. **Real-time rendering**: Ensuring the generation of new views in real-time while maintaining accuracy, which is crucial for applications like autonomous driving. The core idea of HUGS is to represent the scene using a combination of static and dynamic 3D Gaussian distributions by jointly optimizing geometric, appearance, semantic, and motion information. The method pays special attention to handling dynamic objects, such as vehicles, by using physical constraints (e.g., single-wheel model) to reduce noise and improve the performance of dynamic scene reconstruction. Experimental results show that HUGS performs excellently on multiple datasets such as KITTI, KITTI-360, and Virtual KITTI 2, especially in new view synthesis, semantic synthesis, and 3D semantic reconstruction of dynamic scenes, achieving state-of-the-art levels. Additionally, the method demonstrates good robustness, maintaining strong performance even in the presence of very noisy 3D bounding box predictions.

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs

HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes

Urban4D: Semantic-Guided 4D Gaussian Splatting for Urban Scene Reconstruction

GS3LAM: Gaussian Semantic Splatting SLAM

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes

GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

3D Vision-Language Gaussian Splatting

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy

3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities

DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting

SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain

FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

Enhanced 3D Urban Scene Reconstruction and Point Cloud Densification using Gaussian Splatting and Google Earth Imagery