Abstract:Open-set 3D segmentation represents a major point of interest for multiple downstream robotics and augmented/virtual reality applications. Recent advances introduce 3D Gaussian Splatting as a computationally efficient representation of the underlying scene. They enable the rendering of novel views while achieving real-time display rates and matching the quality of computationally far more expensive methods. We present a decoupled 3D segmentation pipeline to ensure modularity and adaptability to novel 3D representations and semantic segmentation foundation models. The pipeline proposes class-agnostic masks based on a 3D reconstruction of the scene. Given the resulting class-agnostic masks, we use a class-aware 2D foundation model to add class annotations to the 3D masks. We test this pipeline with 3D Gaussian Splatting and different 2D segmentation models and achieve better performance than more tailored approaches while also significantly increasing the modularity.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of **open - set 3D semantic segmentation**. Specifically, the authors propose a decoupled 3D semantic segmentation method named DCSEG to address the following challenges: 1. **Scarcity of 3D data**: Compared with 2D images, the amount of 3D scene data is smaller, and it is difficult to directly train an accurate segmentation network on 3D data. 2. **Flexibility and efficiency of 3D representation**: Existing 3D representation methods (such as NeRF, point cloud, etc.) have problems of high computational complexity or insufficient flexibility. 3. **Challenges of instance and part segmentation**: How to perform instance segmentation and part segmentation simultaneously in 3D scenes and aggregate them into meaningful semantic categories. 4. **Open - vocabulary segmentation**: How to handle unseen categories (i.e., open - vocabulary) and ensure that the model can adapt to the emergence of new categories. ### Main contributions of DCSEG - **Utilizing 3D Gaussian lattice representation**: Use 3D Gaussian Splatting as the underlying representation. Compared with NeRF - based methods, it is more computationally efficient and more flexible. - **Decoupled segmentation pipeline**: Decouple the 3D semantic segmentation pipeline, enabling independent optimization of 3D reconstruction and semantic segmentation modules, thereby improving modularity and adaptability. - **Class - independent and class - related mask generation**: First generate class - independent 3D masks, and then assign class labels to these masks through a multi - view 2D semantic segmentation network to achieve class - aware 3D instance and part segmentation. - **Flexibility and extensibility**: The proposed framework can easily replace different 3D representation methods or 2D segmentation models without retraining the entire system. ### Summary This paper solves several key problems in 3D semantic segmentation by proposing the DCSEG method, especially the efficient, flexible, and accurate segmentation tasks in open - set scenarios. By combining 3D Gaussian lattice representation and a decoupled segmentation pipeline, DCSEG shows performance superior to existing methods and has higher modularity and adaptability.

DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting

Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks

SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition

GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus

LineGS : 3D Line Segment Representation on 3D Gaussian Splatting

3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain

A New Split Algorithm for 3D Gaussian Splatting

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities

CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding

LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians

SfM-Free 3D Gaussian Splatting via Hierarchical Training

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting