Abstract:Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining consistency between the subject and the background remains challenging. In this paper, we propose LayerDiffusion, a semantic-based layered controlled image editing method. Our method enables non-rigid editing and attribute modification of specific subjects while preserving their unique characteristics and seamlessly integrating them into new backgrounds. We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy combined with layered diffusion training. During the diffusion process, an iterative guidance strategy is used to generate a final image that aligns with the textual description. Experimental results demonstrate the effectiveness of our method in generating highly coherent images that closely align with the given textual description. The edited images maintain a high similarity to the features of the input image and surpass the performance of current leading image editing methods. LayerDiffusion opens up new possibilities for controllable image editing.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to perform multiple editing operations (such as background replacement and specific object property changes) on a single image simultaneously while maintaining the consistency between the object and the background. Although the existing text - guided image editing methods can generate high - quality composite images according to text descriptions, they have difficulty in simulating the characteristics of specific objects. Even when the most detailed text descriptions are provided, the generated objects may have different appearances and it is difficult to maintain the consistency of the background. In addition, the currently leading image editing methods also face several challenges, including rigid editing limited to images in specific fields, inability to edit the background and specific objects simultaneously, and the need for additional auxiliary input information. These problems have hindered the development of controllable image editing. To alleviate these problems, the paper proposes a semantic - based hierarchical - control image editing method called LayerDiffusion. By simply inputting the text descriptions of multiple editing actions, the target image and the reference image, LayerDiffusion can perform non - rigid editing and property modification on specific objects, generate images consistent with the text descriptions, and maintain a high similarity between the specific object and background features and the input image. Specifically, the main contributions of LayerDiffusion include: 1. **For the first time, realize the simultaneous editing of specific objects and the background on a single input image**: This is an important breakthrough in the field of image editing, because most of the previous methods can only handle a single type of editing task or require multiple input images to complete complex editing operations. 2. **Introduce a new hierarchical diffusion training framework**: This framework makes it possible to perform arbitrary and controllable editing of specific objects and the background, enhancing the flexibility and controllability of the model. 3. **The experimental results show that the images generated by this method are highly similar to the input image in terms of features**: This not only improves the editing quality but also ensures the visual consistency and naturalness of the edited image. In conclusion, by combining large - scale text - to - image models and hierarchical - control optimization strategies, LayerDiffusion solves the limitations of existing image editing methods in multi - task editing and background consistency, bringing new possibilities to the field of image editing.

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

PFB-Diff: Progressive Feature Blending diffusion for text-driven image editing

Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis

Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis

Streamlining Image Editing with Layered Diffusion Brushes

Continuous Layout Editing of Single Images with Diffusion Models

Multi-Region Text-Driven Manipulation of Diffusion Imagery

Text2Layer: Layered Image Generation using Latent Diffusion Model

DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance

Move Anything with Layered Scene Diffusion

Diffusion Model-Based Image Editing: A Survey

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

Region-Aware Diffusion for Zero-shot Text-driven Image Editing

Imagic: Text-Based Real Image Editing with Diffusion Models