DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting

Luis Wiedmann,Luca Wiehe,David Rozenberszki
2024-12-15
Abstract:Open-set 3D segmentation represents a major point of interest for multiple downstream robotics and augmented/virtual reality applications. Recent advances introduce 3D Gaussian Splatting as a computationally efficient representation of the underlying scene. They enable the rendering of novel views while achieving real-time display rates and matching the quality of computationally far more expensive methods. We present a decoupled 3D segmentation pipeline to ensure modularity and adaptability to novel 3D representations and semantic segmentation foundation models. The pipeline proposes class-agnostic masks based on a 3D reconstruction of the scene. Given the resulting class-agnostic masks, we use a class-aware 2D foundation model to add class annotations to the 3D masks. We test this pipeline with 3D Gaussian Splatting and different 2D segmentation models and achieve better performance than more tailored approaches while also significantly increasing the modularity.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of **open - set 3D semantic segmentation**. Specifically, the authors propose a decoupled 3D semantic segmentation method named DCSEG to address the following challenges: 1. **Scarcity of 3D data**: Compared with 2D images, the amount of 3D scene data is smaller, and it is difficult to directly train an accurate segmentation network on 3D data. 2. **Flexibility and efficiency of 3D representation**: Existing 3D representation methods (such as NeRF, point cloud, etc.) have problems of high computational complexity or insufficient flexibility. 3. **Challenges of instance and part segmentation**: How to perform instance segmentation and part segmentation simultaneously in 3D scenes and aggregate them into meaningful semantic categories. 4. **Open - vocabulary segmentation**: How to handle unseen categories (i.e., open - vocabulary) and ensure that the model can adapt to the emergence of new categories. ### Main contributions of DCSEG - **Utilizing 3D Gaussian lattice representation**: Use 3D Gaussian Splatting as the underlying representation. Compared with NeRF - based methods, it is more computationally efficient and more flexible. - **Decoupled segmentation pipeline**: Decouple the 3D semantic segmentation pipeline, enabling independent optimization of 3D reconstruction and semantic segmentation modules, thereby improving modularity and adaptability. - **Class - independent and class - related mask generation**: First generate class - independent 3D masks, and then assign class labels to these masks through a multi - view 2D semantic segmentation network to achieve class - aware 3D instance and part segmentation. - **Flexibility and extensibility**: The proposed framework can easily replace different 3D representation methods or 2D segmentation models without retraining the entire system. ### Summary This paper solves several key problems in 3D semantic segmentation by proposing the DCSEG method, especially the efficient, flexible, and accurate segmentation tasks in open - set scenarios. By combining 3D Gaussian lattice representation and a decoupled segmentation pipeline, DCSEG shows performance superior to existing methods and has higher modularity and adaptability.