M^2DNeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields

Ning Wang,Lefei Zhang,Angel X Chang
2024-01-01
Abstract:Neural fields (NeRF) have emerged as a promising approach for representingcontinuous 3D scenes. Nevertheless, the lack of semantic encoding in NeRFsposes a significant challenge for scene decomposition. To address thischallenge, we present a single model, Multi-Modal Decomposition NeRF(M^2DNeRF), that is capable of both text-based and visual patch-basededits. Specifically, we use multi-modal feature distillation to integrateteacher features from pretrained visual and language models into 3D semanticfeature volumes, thereby facilitating consistent 3D editing. To enforceconsistency between the visual and language features in our 3D feature volumes,we introduce a multi-modal similarity constraint. We also introduce apatch-based joint contrastive loss that helps to encourage object-regions tocoalesce in the 3D feature space, resulting in more precise boundaries.Experiments on various real-world scenes show superior performance in 3D scenedecomposition tasks compared to prior NeRF-based methods.
What problem does this paper attempt to address?