Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing

Jiancheng Huang,Yi Huang,Jianzhuang Liu,Donghao Zhou,Yifan Liu,Shifeng Chen
2024-12-15
Abstract:Text-conditional image editing is a practical AIGC task that has recently emerged with great commercial and academic value. For real image editing, most diffusion model-based methods use DDIM Inversion as the first stage before editing. However, DDIM Inversion often results in reconstruction failure, leading to unsatisfactory performance for downstream editing. To address this problem, we first analyze why the reconstruction via DDIM Inversion fails. We then propose a new inversion and sampling method named Dual-Schedule Inversion. We also design a classifier to adaptively combine Dual-Schedule Inversion with different editing methods for user-friendly image editing. Our work can achieve superior reconstruction and editing performance with the following advantages: 1) It can reconstruct real images perfectly without fine-tuning, and its reversibility is guaranteed mathematically. 2) The edited object/scene conforms to the semantics of the text prompt. 3) The unedited parts of the object/scene retain the original identity.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the deficiencies of existing diffusion - model - based image editing methods in real - image reconstruction. In particular, the DDIM Inversion (Denoising Diffusion Implicit Model Inversion) method often leads to reconstruction failures, thus affecting subsequent editing effects. Specifically: 1. **Problem Description**: - For real - image editing, most diffusion - model methods use DDIM Inversion as the first step. However, DDIM Inversion often results in reconstruction failures, making the performance of downstream editing tasks unsatisfactory. - Reconstruction failures can cause the edited image to fail to maintain the original identity of the unedited parts, that is, the edited object / scene is inconsistent with the object / scene in the original image. 2. **Solution**: - The paper first analyzes why DDIM Inversion leads to reconstruction failures and proposes a new inversion and sampling method, called Dual - Schedule Inversion. - Dual - Schedule Inversion ensures the invertibility of inversion and sampling by introducing two different schedules, thus achieving perfect reconstruction of real images. - In addition, the author also designs a classifier that can adaptively combine Dual - Schedule Inversion with other editing methods to achieve user - friendly image editing. 3. **Advantages**: - Dual - Schedule Inversion can perfectly reconstruct real images without fine - tuning the network, and its invertibility is mathematically guaranteed. - The edited object / scene conforms to the semantics of the text prompt. - The unedited parts retain their original identity. In summary, this paper aims to solve the problem of reconstruction failures in existing real - image editing methods by proposing the Dual - Schedule Inversion method, thereby improving editing performance and ensuring the quality of editing results.