Abstract:Diffusion models have opened the path to a wide range of text-based image editing frameworks. However, these typically build on the multi-step nature of the diffusion backwards process, and adapting them to distilled, fast-sampling methods has proven surprisingly challenging. Here, we focus on a popular line of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We trace the artifacts to mismatched noise statistics between inverted noises and the expected noise schedule, and suggest a shifted noise schedule which corrects for this offset. To increase editing strength, we propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts. All in all, our method enables text-based image editing with as few as three diffusion steps, while providing novel insights into the mechanisms behind popular text-based editing approaches.

What problem does this paper attempt to address?

This paper is primarily dedicated to addressing the issues encountered when using fast sampling diffusion models for text-guided image editing, specifically including the appearance of visual artifacts and insufficient editing strength. ### Problems Addressed 1. **Visual Artifacts**: When traditional text-guided image editing methods based on multi-step diffusion processes are applied to fast sampling (e.g., 1-8 steps) diffusion models, noticeable visual artifacts are generated. 2. **Insufficient Editing Strength**: In fast sampling scenarios, even if the text prompt describing the image is changed, the editing results often have a weak correlation with the new prompt, i.e., the editing effect is not significant. ### Method Overview To address the above challenges, the authors propose the following methods: - **Analyzing the Statistical Characteristics of the Noise Inversion Process**: The authors found that the noise mapping obtained through noise inversion deviates from the statistical characteristics of standard Gaussian noise and proposed a time step offset method to correct this deviation, thereby reducing or eliminating visual artifacts. - **Enhancing Editing Effects**: By enhancing key items in the editing process, similar to Classifier-Free Guidance (CFG), but using fewer network evaluation steps, to increase the impact of the editing prompt on the final image. - **Equivalence Analysis**: The authors also analyzed the equivalence between their method and the Delta Denoising Score (DDS) method, which not only provides an in-depth understanding of the reasons for the success of both methods but also reveals how to further improve editing efficiency. ### Experimental Results The experimental section demonstrates the effectiveness of the proposed method, including qualitative and quantitative results. Compared to various baseline methods, this method achieves faster editing speeds while maintaining or even improving editing quality. Additionally, by comparing the effects of different components, the importance of each component in the proposed method is verified.

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

TurboEdit: Instant text-based image editing

Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing

Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models

High-Fidelity Diffusion-based Image Editing

Diffusion Model-Based Image Editing: A Survey

Inversion-Free Image Editing with Natural Language

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing

Pix2Video: Video Editing using Image Diffusion

FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

Regularized Newton Raphson Inversion for Text-to-Image Diffusion Models

Null-text Inversion for Editing Real Images using Guided Diffusion Models