Abstract:Recent advances in text-guided image editing enable users to perform image edits through simple text inputs, leveraging the extensive priors of multi-step diffusion-based text-to-image models. However, these methods often fall short of the speed demands required for real-world and on-device applications due to the costly multi-step inversion and sampling process involved. In response to this, we introduce SwiftEdit, a simple yet highly efficient editing tool that achieve instant text-guided image editing (in 0.23s). The advancement of SwiftEdit lies in its two novel contributions: a one-step inversion framework that enables one-step image reconstruction via inversion and a mask-guided editing technique with our proposed attention rescaling mechanism to perform localized image editing. Extensive experiments are provided to demonstrate the effectiveness and efficiency of SwiftEdit. In particular, SwiftEdit enables instant text-guided image editing, which is extremely faster than previous multi-step methods (at least 50 times faster) while maintain a competitive performance in editing results. Our project page is at: <a class="link-external link-https" href="https://swift-edit.github.io/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the lack of speed in existing text - guided image editing methods, especially the inefficiency of multi - step diffusion models when running in practical applications and on devices. Although existing methods can achieve image editing through simple text input, they usually require complex multi - step reverse and sampling processes, which lead to high computational costs and long processing times and cannot meet the needs of real - time applications. To address this challenge, the paper proposes SwiftEdit, a simple and efficient text - guided image editing tool that can complete editing tasks within 0.23 seconds. The main innovations of SwiftEdit lie in its two novel contributions: one is a single - step reverse framework, which can achieve image reconstruction in a single step; the other is a mask - based editing technique, combined with the proposed attention rescaling mechanism, to achieve local image editing. These innovations not only greatly improve the editing speed, but also significantly enhance the user experience and application range while maintaining an editing quality comparable to multi - step methods. Specifically, SwiftEdit solves the above problems in the following ways: 1. **Single - step reverse framework**: Compared with traditional multi - step reverse methods, SwiftEdit designs a new single - step reverse framework that can convert the input image into an editable latent representation in one operation. This eliminates the process of iterating multiple times to reverse - generate the initial noise in traditional methods, greatly reducing the computational time and resource consumption. 2. **Mask - based editing technique**: SwiftEdit introduces a new attention rescaling technique that can flexibly control the editing intensity during the editing process while preserving background elements. This method allows users to directly specify the editing area through text prompts without using additional masks, thus simplifying the editing process. Through these technological innovations, SwiftEdit not only achieves extremely fast text - guided image editing speed (at least 50 times faster than previous multi - step methods), but also maintains an editing quality comparable to multi - step methods. Experimental results show that SwiftEdit performs excellently in background preservation (PSNR), editing semantics (CLIP score) and running time, proving its high efficiency and practicality in practical applications.

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models

TurboEdit: Instant text-based image editing

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

Inversion-Free Image Editing with Natural Language

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Forgedit: Text Guided Image Editing via Learning and Forgetting

FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing

ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance

Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference

LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing