Abstract:Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts given a few images for reusing the acquired concepts in a novel context. With massive efforts being dedicated to personalized generation, a promising extension is personalized editing, namely to edit an image using personalized concepts, which can provide a more precise guidance signal than traditional textual guidance. To address this, a straightforward solution is to incorporate a personalized diffusion model with a text-driven editing framework. However, such a solution often shows unsatisfactory editability on the source image. To address this, we propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods. Specifically, we enhance the source image conditioned editability of a personalized diffusion model via a novel Editability Driven Score Distillation (EDSD) objective. Moreover, we identify a mode trapping issue with EDSD, and propose a mode shifting regularization with spatial feature guided sampling to avoid such an issue. We further employ two key modifications to the Delta Denoising Score framework that enable high-fidelity local editing with personalized concepts. Extensive experiments validate that DreamSteerer can significantly improve the editability of several T2I personalization baselines while being computationally efficient.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to enhance the editability of personalized diffusion models when editing source images**. Specifically, the author focuses on the problem that when using personalized concepts to edit images, existing methods are unable to accurately capture the appearance and content of target concepts while maintaining the structure and background of the source image.
### Specific description of the problem
1. **Difference between personalized generation and editing**:
- Existing personalized generation methods (such as DreamBooth) mainly focus on generating new images while retaining the concept identity.
- However, personalized editing requires precise editing based on the source image, which requires higher editability, especially in complex editing scenarios (for example, in cases with significant structural differences).
2. **Limitations of existing methods**:
- Directly combining personalized diffusion models with text - driven editing frameworks usually leads to serious distortion of the source image layout or failure of natural adaptation.
- This is mainly because the data diversity of reference images in the personalization process is insufficient, causing the model to easily get trapped in the pattern of reference images and difficult to handle new situations in the source image.
3. **Challenges of editing requirements**:
- During the editing process, it is necessary not only to maintain the structure and background of the source image but also to accurately capture the appearance and content of personalized concepts.
- In more complex editing scenarios, it may be necessary to extrapolate the properties of personalized concepts to a certain extent, such as changing the structure, pose, or style of the subject.
### Solutions proposed in the paper
To solve the above problems, the author proposes the **DreamSteerer** framework, which enhances the editability under the condition of the source image through the following technical means:
1. **Editability Driven Score Distillation (EDSD)**:
- A new score distillation objective is proposed, aiming to optimize the parameters of the personalized diffusion model so that it can better meet the editing requirements of the source image.
- By introducing perturbations in the single - step denoising direction, the personalized model can learn more accurate noise predictions, thereby improving the editing quality.
2. **Mode Drift Regularization and Spatial Feature - Guided Sampling**:
- Identify and solve the mode - trapping problem in EDSD, that is, the personalized model may get trapped in the intermediate mode between the source image and the reference image.
- Introduce a spatial feature - guided sampling strategy to ensure that the generated image has a structural layout similar to that of the source image while maintaining the appearance of the personalized concept.
3. **Automatic Subject Mask**:
- In order to better maintain the parts unrelated to the source image, the author proposes a method for automatically extracting the subject mask, making the editing more focused on relevant areas.
Through these innovations, DreamSteerer can significantly improve the editing effect on multiple benchmark models, especially in cases with significant structural differences and high data requirements.
### Summary
The core problem of this paper is **How to use personalized concepts to perform high - fidelity editing of images while maintaining the structure and background of the source image**. DreamSteerer successfully solves this problem by introducing techniques such as EDSD, mode drift regularization, and automatic subject mask, and verifies its effectiveness on multiple benchmark models.