RD-NERF: Neural Robust Distilled Feature Fields for Sparse-View Scene Segmentation

Yongjia Ma,Bin Dou,Tianyu Zhang,Zejian Yuan
DOI: https://doi.org/10.1109/icassp48485.2024.10447068
2024-01-01
Abstract:We propose Neural Robust Distilled Feature Fields (RD-NeRF) for achieving robust 3D semantic feature distillation and 3D consistent scene segmentation with sparse-view labels. Specifically, we introduce a two-stage pipeline. In the distillation stage, we employ the pre-trained image feature extractor, DINO-ViT, as the teacher network. RD-NeRF distills semantic knowledge into 3D space and utilizes the Vector-Matrix (VM) tensor decomposition method to represent semantic field with volumetric rendering. For the training process, we utilize the distance-wise and angle-wise distillation loss. This enables the student network to capture high-level semantics, enhance scene reconstruction and segmentation performance, and improve robustness and effectiveness in distillation. In the segmentation stage, hash features and distilled semantic features are inputs for the segmentation MLP, which is supervised by the sparse-view labels. The experimental results demonstrate that our model performs well in 3D-consistent scene segmentation under sparse-view supervision.
What problem does this paper attempt to address?