LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

Pengzhi Li,QInxuan Huang,Yikang Ding,Zhiheng Li
2023-05-30
Abstract:Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining consistency between the subject and the background remains challenging. In this paper, we propose LayerDiffusion, a semantic-based layered controlled image editing method. Our method enables non-rigid editing and attribute modification of specific subjects while preserving their unique characteristics and seamlessly integrating them into new backgrounds. We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy combined with layered diffusion training. During the diffusion process, an iterative guidance strategy is used to generate a final image that aligns with the textual description. Experimental results demonstrate the effectiveness of our method in generating highly coherent images that closely align with the given textual description. The edited images maintain a high similarity to the features of the input image and surpass the performance of current leading image editing methods. LayerDiffusion opens up new possibilities for controllable image editing.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform multiple editing operations (such as background replacement and specific object property changes) on a single image simultaneously while maintaining the consistency between the object and the background. Although the existing text - guided image editing methods can generate high - quality composite images according to text descriptions, they have difficulty in simulating the characteristics of specific objects. Even when the most detailed text descriptions are provided, the generated objects may have different appearances and it is difficult to maintain the consistency of the background. In addition, the currently leading image editing methods also face several challenges, including rigid editing limited to images in specific fields, inability to edit the background and specific objects simultaneously, and the need for additional auxiliary input information. These problems have hindered the development of controllable image editing. To alleviate these problems, the paper proposes a semantic - based hierarchical - control image editing method called LayerDiffusion. By simply inputting the text descriptions of multiple editing actions, the target image and the reference image, LayerDiffusion can perform non - rigid editing and property modification on specific objects, generate images consistent with the text descriptions, and maintain a high similarity between the specific object and background features and the input image. Specifically, the main contributions of LayerDiffusion include: 1. **For the first time, realize the simultaneous editing of specific objects and the background on a single input image**: This is an important breakthrough in the field of image editing, because most of the previous methods can only handle a single type of editing task or require multiple input images to complete complex editing operations. 2. **Introduce a new hierarchical diffusion training framework**: This framework makes it possible to perform arbitrary and controllable editing of specific objects and the background, enhancing the flexibility and controllability of the model. 3. **The experimental results show that the images generated by this method are highly similar to the input image in terms of features**: This not only improves the editing quality but also ensures the visual consistency and naturalness of the edited image. In conclusion, by combining large - scale text - to - image models and hierarchical - control optimization strategies, LayerDiffusion solves the limitations of existing image editing methods in multi - task editing and background consistency, bringing new possibilities to the field of image editing.