Abstract:This paper targets interactive object-level editing (e.g., deletion, recoloring, transformation, composition) in dynamic scenes. Recently, some methods aiming for flexible editing static scenes represented by neural radiance field (NeRF) have shown impressive synthesis quality, while similar capabilities in time-variant dynamic scenes remain limited. To solve this problem, we propose 4D-Editor, an interactive semantic-driven editing framework, allowing editing multiple objects in a dynamic NeRF with user strokes on a single frame. We propose an extension to the original dynamic NeRF by incorporating a hybrid semantic feature distillation to maintain spatial-temporal consistency after editing. In addition, we design Recursive Selection Refinement that significantly boosts object segmentation accuracy within a dynamic NeRF to aid the editing process. Moreover, we develop Multi-view Reprojection Inpainting to fill holes caused by incomplete scene capture after editing. Extensive experiments and editing examples on real-world demonstrate that 4D-Editor achieves photo-realistic editing on dynamic NeRFs. Project page: <a class="link-external link-https" href="https://patrickddj.github.io/4D-Editor" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address the issue of object-level interactive editing in dynamic scenes. Specifically, the authors propose a framework called 4D-Editor, which allows users to edit multiple objects in dynamic neural radiance fields (NeRF) by drawing strokes on a single reference frame. These issues include, but are not limited to, operations such as deletion, recoloring, transformation, and composition. ### Background and Motivation In recent years, some methods have demonstrated the ability to flexibly edit neural radiance fields (NeRF) in static scenes, but these methods have shown limited performance in temporally varying dynamic scenes. Editing dynamic scenes requires maintaining spatial and temporal consistency, which is more technically challenging. Therefore, the authors propose 4D-Editor to address the problem of object-level editing in dynamic NeRF. ### Main Contributions 1. **4D-Editor Framework**: To the best of the authors' knowledge, this is the first interactive editing framework for editing multiple objects in dynamic NeRF through user strokes on 2D images. 4D-Editor can maintain spatiotemporal consistency throughout the dynamic scene and supports various editing operations such as deletion, recoloring, transformation, and composition. 2. **Hybrid Semantic Feature Distillation**: To maintain spatiotemporal consistency after editing, the authors introduce a hybrid semantic feature distillation method, which extracts semantic information in 4D space from a pre-trained DINO model and integrates it into a hybrid semantic radiance field to assist in object segmentation and editing processes. 3. **Recursive Selection Refinement**: The authors propose a recursive selection refinement method that can quickly and accurately select target objects in dynamic NeRF. Experimental results validate the accuracy and efficiency of this method. 4. **Multi-view Reprojection Repair**: To fill holes caused by incomplete scene capture, the authors develop a multi-view reprojection repair strategy. This strategy completes visible parts by observing from multiple perspectives and uses a repair model to generate invisible parts, thereby maintaining visual consistency after editing. ### Method Overview 1. **Preliminary: Hybrid Radiance Field Reconstruction of Dynamic Scenes**: The hybrid radiance field representation of dynamic scenes typically consists of a static radiance field \( F_s \) and a dynamic radiance field \( F_d \). These two fields map the radiance values of the static background and dynamic foreground, respectively, and generate the final pixel color through volume rendering. 2. **Hybrid Semantic Feature Distillation**: To maintain spatiotemporal consistency during object-level editing, the authors use two semantic fields \( G_s \) and \( G_d \) to store the semantic features of the static and dynamic parts, respectively. These semantic features are extracted from a pre-trained DINO model and learned by minimizing the difference between predicted features and ground truth to capture scene semantics. 3. **Recursive Selection Refinement**: Users can mark target objects on the reference frame, and 4D-Editor extracts target 2D semantic features based on user strokes and constructs different queries to match multiple objects in the semantic field. To improve the accuracy of feature matching, the authors use K-Means clustering and a recursive selection refinement algorithm to precisely locate target objects. 4. **Editing Module**: The editing module supports various editing operations, including deletion, filtering, composition, and recoloring. For example, deleting an object means setting it to be transparent to reveal the background behind it; recoloring changes the object's color properties by adjusting the RGB channels. 5. **Multi-view Reprojection Repair**: Due to the limitations of scene observation, deletion operations may create "holes" in new views. The proposed multi-view reprojection repair method fills these holes by observing from multiple perspectives, ensuring that the edited scene remains consistent across different views. ### Experimental Results The authors conducted experiments on three datasets: Dynamic View Synthesis, DAVIS, and NeuPhysics. The experimental results show that 4D-Editor performs excellently in object-level editing, achieving clean and continuous deletion effects while supporting various editing operations such as recoloring and transformation. Additionally, the multi-view reprojection repair method significantly improves the quality of repair results. ### Conclusion 4D-Editor successfully addresses the challenges of object-level editing in dynamic NeRF by introducing techniques such as hybrid semantic feature distillation, recursive selection refinement, and multi-view reprojection repair, providing a solution for dynamic scene editing.

4D-Editor: Interactive Object-level Editing in Dynamic Neural Radiance Fields via Semantic Distillation

SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field

NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds

RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models

EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points

SealD-NeRF: Interactive Pixel-Level Editing for Dynamic Scenes by Neural Radiance Fields

Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields

Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering

Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Model

CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion

Text-driven Editing of 3D Scenes without Retraining

ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

DATENeRF: Depth-Aware Text-based Editing of NeRFs

Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

ED-NeRF: Efficient Text-Guided Editing of 3D Scene with Latent Space NeRF

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

LatentEditor: Text Driven Local Editing of 3D Scenes

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields

Exploration and Improvement of Nerf-based 3D Scene Editing Techniques

SIn-NeRF2NeRF: Editing 3D Scenes with Instructions through Segmentation and Inpainting