EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting

Dong In Lee,Hyeongcheol Park,Jiyoung Seo,Eunbyung Park,Hyunje Park,Ha Dam Baek,Shin Sangheon,Sangmin kim,Sangpil Kim
2024-12-16
Abstract:Recent advancements in 3D editing have highlighted the potential of text-driven methods in real-time, user-friendly AR/VR applications. However, current methods rely on 2D diffusion models without adequately considering multi-view information, resulting in multi-view inconsistency. While 3D Gaussian Splatting (3DGS) significantly improves rendering quality and speed, its 3D editing process encounters difficulties with inefficient optimization, as pre-trained Gaussians retain excessive source information, hindering optimization. To address these limitations, we propose \textbf{EditSplat}, a novel 3D editing framework that integrates Multi-view Fusion Guidance (MFG) and Attention-Guided Trimming (AGT). Our MFG ensures multi-view consistency by incorporating essential multi-view information into the diffusion process, leveraging classifier-free guidance from the text-to-image diffusion model and the geometric properties of 3DGS. Additionally, our AGT leverages the explicit representation of 3DGS to selectively prune and optimize 3D Gaussians, enhancing optimization efficiency and enabling precise, semantically rich local edits. Through extensive qualitative and quantitative evaluations, EditSplat achieves superior multi-view consistency and editing quality over existing methods, significantly enhancing overall efficiency.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the deficiencies of existing 3D scene editing methods in multi - view consistency and optimization efficiency. Specifically: 1. **Multi - view inconsistency problem**: - Existing methods based on 2D diffusion models ignore multi - view information during the editing process, resulting in inconsistent generated images from different perspectives and producing blurry or unnatural results. - For example, some methods only use a limited number of key perspectives and require additional diffusion steps or extra network layers to integrate multi - view information, increasing the computational cost and affecting the editing quality. 2. **Optimization efficiency problem**: - In the pre - trained 3D Gaussian Splatting (3DGS) model, Gaussians retain too much source information, hindering the efficient optimization process, especially when performing local semantic editing. - These pre - trained Gaussians are difficult to converge during the editing process, resulting in low optimization efficiency and thus affecting the editing effect. To solve these problems, the author proposes a new framework named EditSplat, which combines two main techniques: - **Multi - view Fusion Guidance (MFG)**: - MFG ensures that the edited images remain consistent across multiple perspectives by integrating multi - view information into the diffusion process. It utilizes the depth map projection of 3DGS and blends the multi - view images of the initial edit, thereby achieving smooth multi - view fusion. - At the same time, MFG also combines text prompts and source images as auxiliary guidance, balancing the guidance scores of the multi - view fusion image, the source image, and the text prompt, making the editing results more consistent and accurate. - **Attention - Guided Trimming (AGT)**: - AGT selectively trims and optimizes Gaussians with high attention weights by assigning attention weights to each Gaussian, improving the optimization efficiency and achieving semantically rich local editing. - Specifically, AGT determines the areas that need to be modified based on the attention map and trims those redundant Gaussians before editing, thereby simplifying the optimization process and improving the editing accuracy. Through these innovations, EditSplat is significantly superior to existing methods in terms of multi - view consistency and editing quality, while greatly improving the overall optimization efficiency.