SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

Qi Qian,Haiyang Xu,Ming Yan,Juhua Hu
2024-09-17
Abstract:Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion is not optimized for classifier-free guidance and the accumulated error will result in the undesired performance. While many algorithms are developed to improve the framework of DDIM inversion for editing, in this work, we investigate the approximation error in DDIM inversion and propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Moreover, a better guidance scale (i.e., 0.5) than default settings can be derived theoretically. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address issues in text-guided image editing based on the DDIM (Denoising Diffusion Implicit Models) reverse process. Specifically: 1. **Problem Background**: The existing DDIM reverse process has cumulative error issues in image editing, especially under classifier-free guidance. 2. **Research Objective**: To reduce the approximation error in the DDIM reverse process by decoupling the guidance scales of the source branch and the target branch, while maintaining the original framework structure. 3. **Main Contributions**: - Proposed a simple framework, SimInversion, which improves generation fidelity by adopting symmetric guidance scales for the source branch. - Theoretical analysis shows that selecting an appropriate guidance scale can further reduce approximation error, with experimental results supporting that 0.5 is a good choice. - Experiments on the PIE-Bench dataset validate the effectiveness of this method, showing significant improvements over baseline methods across multiple evaluation metrics. Overall, the goal of this paper is to improve the DDIM reverse process without sacrificing efficiency, thereby enhancing the quality of text-guided image editing.