ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing

Jimin Dai,Yingzhen Zhang,Shuo Chen,Jian Yang,Lei Luo
2024-10-18
Abstract:Diffusion models (DMs) have been successfully applied to real image editing. These models typically invert images into latent noise vectors used to reconstruct the original images (known as inversion), and then edit them during the inference process. However, recent popular DMs often rely on the assumption of local linearization, where the noise injected during the inversion process is expected to approximate the noise removed during the inference process. While DM efficiently generates images under this assumption, it can also accumulate errors during the diffusion process due to the assumption, ultimately negatively impacting the quality of real image reconstruction and editing. To address this issue, we propose a novel method, referred to as ERDDCI (Exact Reversible Diffusion via Dual-Chain Inversion). ERDDCI uses the new Dual-Chain Inversion (DCI) for joint inference to derive an exact reversible diffusion process. By using DCI, our method effectively avoids the cumbersome optimization process in existing inversion approaches and achieves high-quality image editing. Additionally, to accommodate image operations under high guidance scales, we introduce a dynamic control strategy that enables more refined image reconstruction and editing. Our experiments demonstrate that ERDDCI significantly outperforms state-of-the-art methods in a 50-step diffusion process. It achieves rapid and precise image reconstruction with an SSIM of 0.999 and an LPIPS of 0.001, and also delivers competitive results in image editing.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the quality degradation problems encountered by diffusion models (Diffusion Models, DMs) in real - image editing. Specifically: 1. **Problem of cumulative error**: - Existing diffusion models usually rely on the local linear assumption, that is, the noise injected in the reverse generation process should be approximately equal to the noise removed in the forward diffusion process. Although this assumption enables diffusion models to generate images efficiently, it also accumulates errors during the diffusion process, resulting in a decline in the quality of reconstructed and edited images. - These cumulative errors are particularly obvious at high guidance scales, further exacerbating the deviation of semantic information and affecting the fidelity of edited images. 2. **Requirement for high - quality image editing**: - High - quality edited images should meet the following criteria: - The edited image should maintain a high degree of consistency with the original image and should not change the original layout or make unnecessary semantic changes. - The edited image must meet the editing requirements, that is, accurately modify the specified semantic information. 3. **Limitations of existing methods**: - Although some existing methods (such as Prompt - to - Prompt (PTP) and Null - Text Inversion (NTI)) can improve the quality of image editing to a certain extent, they still have complex optimization processes, high computational costs, and are prone to semantic deviation at high guidance scales. ### Solutions To overcome the above problems, the paper proposes a new method called ERDDCI (Exact Reversible Diffusion via Dual - Chain Inversion). The main contributions of ERDDCI include: 1. **Dual - Chain Inversion (DCI) technology**: - By introducing an auxiliary inversion chain, ERDDCI can accurately align the injected and removed noise during the inversion and inference processes, thereby achieving a fully reversible diffusion process. - Specifically, ERDDCI predicts the noise of the current latent variable at each time step and injects it into the latent variable of the next time step to form an auxiliary inversion chain. During the inference process, the noise in the auxiliary inversion chain is accurately removed to ensure the accurate reconstruction and editing of the image. 2. **Dynamic Control Strategy (DCS)**: - In order to further improve the quality of image reconstruction and editing at high guidance scales, ERDDCI introduces a dynamic control strategy. DCS avoids semantic drift generated at high guidance scales by gradually activating and adjusting the guidance scale. - The specific implementation method of the dynamic control strategy is to gradually increase the guidance scale in the initial stage of image generation so that it matches the semantic content of the image generated at the current time step, thereby preventing major semantic deviations. ### Experimental results Experiments show that ERDDCI is significantly superior to the existing state - of - the - art methods in the 50 - step diffusion process. It can quickly and accurately reconstruct images, with the SSIM value reaching 0.999 and the LPIPS value reaching 0.001. At the same time, it also performs excellently in image editing tasks. Especially at high guidance scales, ERDDCI can complete image reconstruction and editing with higher precision. ### Summary By introducing the dual - chain inversion technology and the dynamic control strategy, ERDDCI not only eliminates the cumulative errors in existing methods, improves the quality of image reconstruction and editing, but also significantly reduces the computational cost and improves the time efficiency. These improvements make ERDDCI more practical and robust in practical applications.