ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing

Jun-Kun Chen,Yu-Xiong Wang
2024-11-08
Abstract:This paper proposes ProEdit - a simple yet effective framework for high-quality 3D scene editing guided by diffusion distillation in a novel progressive manner. Inspired by the crucial observation that multi-view inconsistency in scene editing is rooted in the diffusion model's large feasible output space (FOS), our framework controls the size of FOS and reduces inconsistency by decomposing the overall editing task into several subtasks, which are then executed progressively on the scene. Within this framework, we design a difficulty-aware subtask decomposition scheduler and an adaptive 3D Gaussian splatting (3DGS) training strategy, ensuring high quality and efficiency in performing each subtask. Extensive evaluation shows that our ProEdit achieves state-of-the-art results in various scenes and challenging editing tasks, all through a simple framework without any expensive or sophisticated add-ons like distillation losses, components, or training procedures. Notably, ProEdit also provides a new way to control, preview, and select the "aggressivity" of editing operation during the editing process.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve high - quality Instruction - Guided Scene Editing (IGSE) in 3D scene editing. Specifically, current methods face challenges when dealing with multi - view inconsistency and geometric change tasks. These challenges mainly stem from the large Feasible Output Space (FOS) of diffusion models, resulting in problems such as dull colors, blurry textures, and geometric noise in the edited scenes. The paper proposes a new framework, ProEdit, which decomposes complex editing tasks into multiple simple subtasks and executes these subtasks step by step to control the size of FOS, thereby reducing multi - view inconsistency and improving editing quality. ### Main contributions of the paper: 1. **New insights into subtask decomposition and step - by - step editing**: The paper proposes a method to control the large Feasible Output Space (FOS) through editing task decomposition, thus solving the core challenges in 3D scene editing. 2. **Simple and effective framework**: ProEdit achieves high - quality scene editing by solving each subtask step by step, without the need for complex add - ons or training processes, while supporting "aggressive" control, preview, and selection of editing operations. 3. **Achieving high - quality editing results in various scenes and tasks**: ProEdit performs excellently in different scenes and challenging editing tasks, reaching state - of - the - art performance. ### Method overview: 1. **Subtask definition based on interpolation**: Subtasks are defined through text - encoding interpolation, making the FOS of each subtask significantly smaller. 2. **Difficulty - aware subtask scheduler**: Edit tasks are dynamically decomposed according to the difficulty of subtasks (approximately measured by FOS size) to ensure that the difficulty of each subtask is comparable. 3. **Adaptive 3D Gaussian Splatting (3DGS) method**: Use 3DGS as the scene representation, and through an adaptive Gaussian creation strategy, ensure high - quality editing of each subtask and finally complete the entire editing task. ### Experimental results: - **Qualitative results**: In the comparative experiments of Fangzhou and Face scenes, ProEdit shows high - quality editing results at different aggression levels, with clear textures and precise shapes. - **Quantitative evaluation**: Through indicators such as user studies, GPT scores, and CLIP scores, ProEdit performs excellently in overall quality and 3D consistency. Especially on the ScanNet++ dataset, its results are comparable to or even better than those of the complex baseline method ConsistDreamer. - **Outdoor scene experiments**: In the "Bear" and "Floating Tree" scenes, ProEdit not only achieves high - quality editing but also provides an aggression control function, enabling users to preview and select intermediate results during the editing process. In conclusion, ProEdit effectively solves the multi - view inconsistency and geometric change problems in 3D scene editing through a simple step - by - step editing method, providing a new solution for high - quality instruction - guided scene editing.