Abstract:3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. Project website at: <a class="link-external link-https" href="https://feature-3dgs.github.io/" rel="external noopener nofollow">this https URL</a>

RD-NERF: Neural Robust Distilled Feature Fields for Sparse-View Scene Segmentation

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models

SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

M^2DNeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields

Depth-supervised NeRF: Fewer Views and Faster Training for Free

ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis

${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields

Instance Neural Radiance Field

ONeRF: Unsupervised 3D Object Segmentation from Multiple Views

RDNeRF: Relative Depth Guided NeRF for Dense Free View Synthesis

NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion

HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View Inputs

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

DSSMNeRF: Depth Self-supervised MVS NeRF

Drone-NeRF: Efficient NeRF based 3D scene reconstruction for large-scale drone survey

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

Divide and Conquer: Rethinking the Training Paradigm of Neural Radiance Fields

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding