Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Jun Guo,Xiaojian Ma,Yue Fan,Huaping Liu,Qing Li

2024-08-23

Abstract:Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, with wide-ranging applications in embodied agents and augmented reality systems. Existing methods adopt neurel rendering methods as 3D representations and jointly optimize color and semantic features to achieve rendering and scene understanding simultaneously. In this paper, we introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our key idea is to distill knowledge from 2D pre-trained models to 3D Gaussians. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, which is based on spatial relationship and need no additional training. We further build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference. The quantitative results on ScanNet segmentation and LERF object localization demonstates the superior performance of our method. Additionally, we explore several applications of Semantic Gaussians including object part segmentation, instance segmentation, scene editing, and spatiotemporal segmentation with better qualitative results over 2D and 3D baselines, highlighting its versatility and effectiveness on supporting diverse downstream tasks.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the problem of open-vocabulary 3D scene understanding. Specifically, the research goal is to understand and interpret information within 3D scenes given a scenario, rather than being limited to predefined object categories. Open-vocabulary 3D scene understanding allows machines to interact with the environment in natural language, enabling tasks such as object recognition, semantic scene reconstruction, and navigation in complex environments. To achieve this goal, the researchers propose the Semantic Gaussians method, a novel open-vocabulary scene understanding approach based on 3D Gaussian lattices. Unlike existing methods, this approach designs a flexible projection method that maps various 2D semantic features from a pre-trained image encoder onto new semantic components of the 3D Gaussian lattice, without additional training. Furthermore, the paper constructs a 3D semantic network that can directly predict semantic components from the original 3D Gaussian lattice, enabling fast inference. Quantitative results on the ScanNet segmentation and LERF object localization datasets demonstrate the superior performance of this method. Additionally, the paper explores the performance of Semantic Gaussians in multiple application areas, including object part segmentation, instance segmentation, scene editing, and spatiotemporal segmentation, showcasing its versatility and effectiveness.

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

3D Vision-Language Gaussian Splatting

Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding

SLGaussian: Fast Language Gaussian Splatting in Sparse Views

CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding

SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM

GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis

FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

SparseLGS: Sparse View Language Embedded Gaussian Splatting

Occam's LGS: A Simple Approach for Language Gaussian Splatting

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain

SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians

GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Learning Segmented 3D Gaussians via Efficient Feature Unprojection for Zero-shot Neural Scene Segmentation

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning