GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

Jing Wu,Jia-Wang Bian,Xinghui Li,Guangrun Wang,Ian Reid,Philip Torr,Victor Adrian Prisacariu
2024-07-14
Abstract:We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works. It leads to faster editing as well as higher visual quality. This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problem of maintaining multi - view consistency in 3D scene editing. Specifically, although existing methods such as Instruct NeRF2NeRF (IN2N) can edit 3D scenes through text instructions, it is difficult to ensure the image consistency under different views during the editing process, resulting in the final 3D editing results may have blurred or unreasonable geometric changes. These problems are mainly caused by the fact that modern 2D diffusion models cannot enforce geometric consistency across views when processing a single image. To overcome these challenges, the paper proposes GaussCtrl, a text - driven method for editing 3D scenes reconstructed by 3D Gaussian point clouds (3DGS). The main contributions of GaussCtrl lie in the introduction of a multi - view consistent editing framework under depth conditions and an attention - based latent code alignment module. These two innovations work together to enable the consistency of the 3D model to be maintained while editing all images, thus achieving faster editing speed and higher visual quality. The specific contributions of the paper can be summarized as follows: 1. Propose GaussCtrl to efficiently edit 3DGS scenes through text instructions. 2. Utilize depth guidance and an attention - based latent code alignment module to encourage multi - view consistent editing. 3. Experiments show that the proposed method achieves more realistic editing effects in various 3D editing scenarios and has higher visual quality than existing methods. The application of these techniques not only improves the quality of 3D scene editing, but also significantly reduces the processing time, providing new tools and methods for 3D content creation.