Abstract:We address the task of uplifting visual features or semantic masks from 2D vision models to 3D scenes represented by Gaussian Splatting. Whereas common approaches rely on iterative optimization-based procedures, we show that a simple yet effective aggregation technique yields excellent results. Applied to semantic masks from Segment Anything (SAM), our uplifting approach leads to segmentation quality comparable to the state of the art. We then extend this method to generic DINOv2 features, integrating 3D scene geometry through graph diffusion, and achieve competitive segmentation results despite DINOv2 not being trained on millions of annotated masks like SAM.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to uplift 2D visual features or semantic masks into 3D scenes represented by Gaussian Splatting. Specifically, the author proposes a simple and effective aggregation technique that can achieve this goal without relying on iterative optimization. This method is not only computationally efficient but also applicable to any type of feature. ### Main problems 1. **Uplifting 2D visual features or semantic masks to 3D scenes**: Existing methods usually rely on iterative optimization processes to learn 3D representations of specific scenes by minimizing reprojection errors. However, these methods are computationally complex and time - consuming. 2. **Improving segmentation quality**: The author hopes to achieve segmentation quality comparable to existing state - of - the - art methods through a simple aggregation technique without using large - scale labeled data. ### Solutions - **Learning - free uplift method**: The author proposes a training - free uplift method that can be directly integrated into the rendering process. This method maps 2D features into 3D Gaussian lattices by weighted averaging and keeps the magnitude of the features unchanged through normalization. - **Combined with graph diffusion**: For general DINOv2 features, the author introduces graph diffusion technology to combine 3D scene geometry information, so as to achieve better results in segmentation tasks. ### Experimental verification - **Segmentation tasks**: The author verifies the effectiveness of this method in multiple view segmentation tasks, including segmentation using semantic masks generated by SAM and DINOv2 features. - **Quantitative results**: Experiments show that this method can be comparable to existing state - of - the - art methods in segmentation quality, while being faster and having less memory footprint. ### Summary The main contributions of this paper are: 1. Proposing a simple, learning - free uplift method that can be directly applied to the rendering process, especially performing well when dealing with semantic masks generated by SAM. 2. Demonstrating the effectiveness of DINOv2 features combined with graph diffusion technology in segmentation tasks, although DINOv2 has not been specifically trained for segmentation tasks. 3. This method can also generate high - resolution general - purpose feature maps as useful by - products for other tasks. Through these contributions, the author shows how to efficiently uplift 2D visual features to 3D scenes without relying on complex optimization processes and has achieved significant results in segmentation tasks.

LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

3D Vision-Language Gaussian Splatting

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

DepthSplat: Connecting Gaussian Splatting and Depth

Feature Splatting for Better Novel View Synthesis with Low Overlap

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Occam's LGS: A Simple Approach for Language Gaussian Splatting

Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors

SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain

SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians

Gaussian Splatting in Style

SADG: Segment Any Dynamic Gaussian Without Object Trackers

RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields

4-LEGS: 4D Language Embedded Gaussian Splatting

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

GraspSplats: Efficient Manipulation with 3D Feature Splatting