LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

Juliette Marrie,Romain Ménégaux,Michael Arbel,Diane Larlus,Julien Mairal
2024-10-18
Abstract:We address the task of uplifting visual features or semantic masks from 2D vision models to 3D scenes represented by Gaussian Splatting. Whereas common approaches rely on iterative optimization-based procedures, we show that a simple yet effective aggregation technique yields excellent results. Applied to semantic masks from Segment Anything (SAM), our uplifting approach leads to segmentation quality comparable to the state of the art. We then extend this method to generic DINOv2 features, integrating 3D scene geometry through graph diffusion, and achieve competitive segmentation results despite DINOv2 not being trained on millions of annotated masks like SAM.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to uplift 2D visual features or semantic masks into 3D scenes represented by Gaussian Splatting. Specifically, the author proposes a simple and effective aggregation technique that can achieve this goal without relying on iterative optimization. This method is not only computationally efficient but also applicable to any type of feature. ### Main problems 1. **Uplifting 2D visual features or semantic masks to 3D scenes**: Existing methods usually rely on iterative optimization processes to learn 3D representations of specific scenes by minimizing reprojection errors. However, these methods are computationally complex and time - consuming. 2. **Improving segmentation quality**: The author hopes to achieve segmentation quality comparable to existing state - of - the - art methods through a simple aggregation technique without using large - scale labeled data. ### Solutions - **Learning - free uplift method**: The author proposes a training - free uplift method that can be directly integrated into the rendering process. This method maps 2D features into 3D Gaussian lattices by weighted averaging and keeps the magnitude of the features unchanged through normalization. - **Combined with graph diffusion**: For general DINOv2 features, the author introduces graph diffusion technology to combine 3D scene geometry information, so as to achieve better results in segmentation tasks. ### Experimental verification - **Segmentation tasks**: The author verifies the effectiveness of this method in multiple view segmentation tasks, including segmentation using semantic masks generated by SAM and DINOv2 features. - **Quantitative results**: Experiments show that this method can be comparable to existing state - of - the - art methods in segmentation quality, while being faster and having less memory footprint. ### Summary The main contributions of this paper are: 1. Proposing a simple, learning - free uplift method that can be directly applied to the rendering process, especially performing well when dealing with semantic masks generated by SAM. 2. Demonstrating the effectiveness of DINOv2 features combined with graph diffusion technology in segmentation tasks, although DINOv2 has not been specifically trained for segmentation tasks. 3. This method can also generate high - resolution general - purpose feature maps as useful by - products for other tasks. Through these contributions, the author shows how to efficiently uplift 2D visual features to 3D scenes without relying on complex optimization processes and has achieved significant results in segmentation tasks.