Abstract:This paper targets interactive object-level editing (e.g., deletion, recoloring, transformation, composition) in dynamic scenes. Recently, some methods aiming for flexible editing static scenes represented by neural radiance field (NeRF) have shown impressive synthesis quality, while similar capabilities in time-variant dynamic scenes remain limited. To solve this problem, we propose 4D-Editor, an interactive semantic-driven editing framework, allowing editing multiple objects in a dynamic NeRF with user strokes on a single frame. We propose an extension to the original dynamic NeRF by incorporating a hybrid semantic feature distillation to maintain spatial-temporal consistency after editing. In addition, we design Recursive Selection Refinement that significantly boosts object segmentation accuracy within a dynamic NeRF to aid the editing process. Moreover, we develop Multi-view Reprojection Inpainting to fill holes caused by incomplete scene capture after editing. Extensive experiments and editing examples on real-world demonstrate that 4D-Editor achieves photo-realistic editing on dynamic NeRFs. Project page: <a class="link-external link-https" href="https://patrickddj.github.io/4D-Editor" rel="external noopener nofollow">this https URL</a>
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve
The paper aims to address the issue of object-level interactive editing in dynamic scenes. Specifically, the authors propose a framework called 4D-Editor, which allows users to edit multiple objects in dynamic neural radiance fields (NeRF) by drawing strokes on a single reference frame. These issues include, but are not limited to, operations such as deletion, recoloring, transformation, and composition.
### Background and Motivation
In recent years, some methods have demonstrated the ability to flexibly edit neural radiance fields (NeRF) in static scenes, but these methods have shown limited performance in temporally varying dynamic scenes. Editing dynamic scenes requires maintaining spatial and temporal consistency, which is more technically challenging. Therefore, the authors propose 4D-Editor to address the problem of object-level editing in dynamic NeRF.
### Main Contributions
1. **4D-Editor Framework**: To the best of the authors' knowledge, this is the first interactive editing framework for editing multiple objects in dynamic NeRF through user strokes on 2D images. 4D-Editor can maintain spatiotemporal consistency throughout the dynamic scene and supports various editing operations such as deletion, recoloring, transformation, and composition.
2. **Hybrid Semantic Feature Distillation**: To maintain spatiotemporal consistency after editing, the authors introduce a hybrid semantic feature distillation method, which extracts semantic information in 4D space from a pre-trained DINO model and integrates it into a hybrid semantic radiance field to assist in object segmentation and editing processes.
3. **Recursive Selection Refinement**: The authors propose a recursive selection refinement method that can quickly and accurately select target objects in dynamic NeRF. Experimental results validate the accuracy and efficiency of this method.
4. **Multi-view Reprojection Repair**: To fill holes caused by incomplete scene capture, the authors develop a multi-view reprojection repair strategy. This strategy completes visible parts by observing from multiple perspectives and uses a repair model to generate invisible parts, thereby maintaining visual consistency after editing.
### Method Overview
1. **Preliminary: Hybrid Radiance Field Reconstruction of Dynamic Scenes**: The hybrid radiance field representation of dynamic scenes typically consists of a static radiance field \( F_s \) and a dynamic radiance field \( F_d \). These two fields map the radiance values of the static background and dynamic foreground, respectively, and generate the final pixel color through volume rendering.
2. **Hybrid Semantic Feature Distillation**: To maintain spatiotemporal consistency during object-level editing, the authors use two semantic fields \( G_s \) and \( G_d \) to store the semantic features of the static and dynamic parts, respectively. These semantic features are extracted from a pre-trained DINO model and learned by minimizing the difference between predicted features and ground truth to capture scene semantics.
3. **Recursive Selection Refinement**: Users can mark target objects on the reference frame, and 4D-Editor extracts target 2D semantic features based on user strokes and constructs different queries to match multiple objects in the semantic field. To improve the accuracy of feature matching, the authors use K-Means clustering and a recursive selection refinement algorithm to precisely locate target objects.
4. **Editing Module**: The editing module supports various editing operations, including deletion, filtering, composition, and recoloring. For example, deleting an object means setting it to be transparent to reveal the background behind it; recoloring changes the object's color properties by adjusting the RGB channels.
5. **Multi-view Reprojection Repair**: Due to the limitations of scene observation, deletion operations may create "holes" in new views. The proposed multi-view reprojection repair method fills these holes by observing from multiple perspectives, ensuring that the edited scene remains consistent across different views.
### Experimental Results
The authors conducted experiments on three datasets: Dynamic View Synthesis, DAVIS, and NeuPhysics. The experimental results show that 4D-Editor performs excellently in object-level editing, achieving clean and continuous deletion effects while supporting various editing operations such as recoloring and transformation. Additionally, the multi-view reprojection repair method significantly improves the quality of repair results.
### Conclusion
4D-Editor successfully addresses the challenges of object-level editing in dynamic NeRF by introducing techniques such as hybrid semantic feature distillation, recursive selection refinement, and multi-view reprojection repair, providing a solution for dynamic scene editing.