Abstract:We introduce SeedEdit, a diffusion model that is able to revise a given image with any text prompt. In our perspective, the key to such a task is to obtain an optimal balance between maintaining the original image, i.e. image reconstruction, and generating a new image, i.e. image re-generation. To this end, we start from a weak generator (text-to-image model) that creates diverse pairs between such two directions and gradually align it into a strong image editor that well balances between the two tasks. SeedEdit can achieve more diverse and stable editing capability over prior image editing methods, enabling sequential revision over images generated by diffusion models.

What problem does this paper attempt to address?

The paper attempts to address two major challenges in image editing: the balance between **Image Reconstruction** and **Image Re-generation**. ### Specific Problem Description: 1. **Insufficient Controllability in Image Editing**: - Current diffusion models can generate realistic and diverse images from text descriptions, but these generated images are often difficult to control. The generation process is more like "rolling the dice" until a good output is seen. - To achieve better control over the generated content, a method is needed to modify the input image according to text instructions, i.e., **instruction-based image editing**. 2. **Limitations of Existing Methods**: - **Training-free Methods**: These methods combine specific techniques such as DDIM inversion, test-time fine-tuning, and attention control to reconstruct the input image and generate new images. However, due to the instability of the reconstruction and re-generation processes, these methods accumulate more errors in the edited images, resulting in outputs that are inconsistent with the input image or target description. - **Data-driven Methods**: These methods require the preparation of large-scale paired editing datasets to train instruction-based diffusion models. However, preparing diverse and high-quality editing datasets is very challenging because image editing pairs are very rare, making it almost impossible to collect a high-quality dataset that covers all types of editing pairs. ### Proposed Solution in the Paper: - **SeedEdit Framework**: This framework aims to transform an image generation diffusion model into an image editing model. By gradually aligning the generation model, it achieves the optimal balance between image reconstruction and re-generation. - **Data Generation and Model Optimization**: First, a pre-trained text-to-image (T2I) model is used to generate diverse paired data, then through iterative data sampling and model optimization, the diffusion model is gradually aligned to achieve the best editing effect. - **Causal Diffusion Model**: A causal diffusion model is proposed, which can handle both image and text conditions simultaneously to improve the accuracy of editing and the consistency of the image. ### Experimental Results: - Experimental results on the HQ-Edit and Emu Edit benchmark datasets show that SeedEdit significantly outperforms existing methods in editing performance, especially on the HQ-Edit dataset, with higher CLIP image similarity, indicating better retention of the original image content. - Quality evaluation results show that SeedEdit has a higher success rate in handling vague instructions and fine-grained edits. In conclusion, by proposing the SeedEdit framework, the paper successfully addresses the balance between image reconstruction and re-generation in image editing, improving the controllability and accuracy of image editing.

SeedEdit: Align Image Re-Generation to Image Editing

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds

Edit Everything: A Text-Guided Generative System for Images Editing

ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models

EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

Diffusion Model-Based Image Editing: A Survey

PRedItOR: Text Guided Image Editing with Diffusion Prior

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Pix2Video: Video Editing using Image Diffusion

Editable Image Elements for Controllable Synthesis

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models