Abstract:Our work addresses limitations seen in previous approaches for object-centric editing problems, such as unrealistic results due to shape discrepancies and limited control in object replacement or insertion. To this end, we introduce FlexEdit, a flexible and controllable editing framework for objects where we iteratively adjust latents at each denoising step using our FlexEdit block. Initially, we optimize latents at test time to align with specified object constraints. Then, our framework employs an adaptive mask, automatically extracted during denoising, to protect the background while seamlessly blending new content into the target image. We demonstrate the versatility of FlexEdit in various object editing tasks and curate an evaluation test suite with samples from both real and synthetic images, along with novel evaluation metrics designed for object-centric editing. We conduct extensive experiments on different editing scenarios, demonstrating the superiority of our editing framework over recent advanced text-guided image editing methods. Our project page is published at
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in object - centered image editing, existing methods have some limitations, such as the generated object shapes being unrealistic and the limited ability to control object replacement or insertion. To overcome these problems, the authors propose a flexible and controllable editing framework - FlexEdit, aiming to achieve the following goals:
1. **Object Replacement**: Be able to flexibly adjust the size and position of the replacement object to better conform to the user's editing intention.
2. **Object Addition**: Be able to add new objects naturally without using additional mask inputs.
3. **Object Removal**: When removing an object, it will not affect the quality of the original image.
Specifically, FlexEdit achieves these goals by adjusting latent variables in each denoising step and using adaptive masks to protect background information. In addition, the authors also introduce new evaluation datasets and metrics to better evaluate object - centered image editing tasks.
### Main Contributions
1. **Propose a New Editing Framework**: For object - centered image editing tasks, a flexible and controllable editing framework, FlexEdit, is proposed.
2. **Introduce a New Test Suite**: Including test samples and new evaluation metrics, specifically for object - centered image editing.
3. **Conduct Extensive Evaluations**: Comparative experiments with the latest editing algorithms on different benchmark datasets are carried out, demonstrating the superiority of FlexEdit in various flexible and customizable object editing applications.
### Method Overview
1. **Latent Optimization**: Obtain editing semantics by optimizing latent variables, including size and position control during object replacement and attention separation during object addition.
2. **Latent Fusion**: Use an adaptive binary mask to fuse the edited latent variables with the background information of the source image to ensure seamless connection between the editing area and the background.
3. **Iterative Latent Manipulation**: Ensure the quality of the editing results by iteratively performing latent optimization and latent fusion.
### Experimental Results
The authors conducted experiments on multiple datasets, including MagicO, PieBenchO, and SynO. The results show that FlexEdit is superior to existing editing methods in both background preservation and editing semantics. In particular, in tasks such as object replacement, object addition, and object removal, FlexEdit exhibits higher flexibility and control ability.
### Formulas
- **Latent Optimization Loss Function**:
\[
L_{\text{pos}}=\| \text{centroid}_{j,t}-\text{centroid}^*_{t} \|_2^2
\]
\[
L_{\text{size}}=\| \text{size}_{j,t}-\text{size}^*_{t} \|_2^2
\]
- **Separation Loss Function**:
\[
L_{\text{sep}}=\frac{\sum_{k = 1}^{H\times W} f_{j,t,k}\cdot g_{i,k}}{\| f_{j,t} \|_2^2\cdot \| g_i \|_2^2}
\]
- **Latent Fusion**:
\[
z^*_t = z''_t\odot\hat{M}_t+z_t\odot(1 - \hat{M}_t)
\]
These formulas ensure that FlexEdit can flexibly control object attributes during the editing process and maintain the integrity of background information.