Peiye Zhuang,Oluwasanmi Koyejo,Alexander G. Schwing
Abstract:Controllable semantic image editing enables a user to change entire image attributes with a few clicks, e.g., gradually making a summer scene look like it was taken in winter. Classic approaches for this task use a Generative Adversarial Net (GAN) to learn a latent space and suitable latent-space transformations. However, current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. To address these concerns, we learn multiple attribute transformations simultaneously, integrate attribute regression into the training of transformation functions, and apply a content loss and an adversarial loss that encourages the maintenance of image identity and photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work, which primarily focuses on qualitative evaluation. Our model permits better control for both single- and multiple-attribute editing while preserving image identity and realism during transformation. We provide empirical results for both natural and synthetic images, highlighting that our model achieves state-of-the-art performance for targeted image manipulation.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **controllable semantic image editing**. Specifically, the author focuses on how to achieve continuous control of image attributes by navigating in the latent space of the Generative Adversarial Network (GAN), while maintaining the identity characteristics of the image and the photo - realism. Existing methods usually have problems such as attribute - editing entanglement, global image - identity change, and a decline in photo - realism. To overcome these problems, this paper proposes a new framework that can simultaneously learn multiple attribute transformations and integrate attribute regression when training the transformation function, while applying content loss and adversarial loss to encourage the maintenance of image identity and photo - realism. In addition, this paper also proposes a quantitative evaluation strategy for measuring controllable editing performance, which is different from previous work that mainly relied on qualitative evaluation.
### Main contributions:
1. **Simultaneous multi - attribute editing**: Through the joint sampling strategy, multiple attributes are edited simultaneously, improving the flexibility of editing.
2. **Attribute decoupling**: By integrating a regressor to predict the attributes of the image, the attribute editing is more precise and the entanglement between attributes is avoided.
3. **Maintaining image identity and photo - realism**: The perceptual loss and the adversarial loss are introduced to ensure that the identity characteristics of the image and the photo - realism are maintained during the editing process.
4. **Quantitative evaluation**: A quantitative evaluation method is proposed to measure the controllable editing performance, making up for the deficiency that previous work mainly relied on qualitative evaluation.
### Method overview:
- **Problem definition**: Given a fixed GAN model, including a generator \( G \) and a discriminator \( D \), the input is a latent vector \( z\in\mathbb{R}^m \). The goal is to discover semantically meaningful directions \( T = \{d_1,\ldots,d_N\} \) in the latent space, through which the attributes of the synthesized image \( G(z) \) can be manipulated.
- **Objective function**: Minimize the weighted objective function \( \mathcal{L}=\lambda_1\mathcal{L}_{\text{reg}}+\lambda_2\mathcal{L}_{\text{disc}}+\lambda_3\mathcal{L}_{\text{content}} \), where:
- **Regression loss** \( \mathcal{L}_{\text{reg}} \): Evaluate whether \( T \) has performed the transformation indicated by \( \epsilon \).
- **Adversarial loss** \( \mathcal{L}_{\text{disc}} \): Use the discriminator \( D \) to measure the quality of the generated image.
- **Content loss** \( \mathcal{L}_{\text{content}} \): Designed to estimate the distance between two images to maintain the image identity.
### Experimental results:
- **Natural scene data set**: Experiments were carried out on the Transient Attribute Database and the MIT Places2 data set, and the results show that the method in this paper performs better in editing image details while maintaining the identity characteristics of the image.
- **Face data set**: Experiments were carried out on the FFHQ and CelebA - HQ data sets, and the results show that the method in this paper has better controllability and image - identity - maintaining ability when editing multiple attributes.
In conclusion, this paper proposes an effective method for controllable semantic image editing. By navigating in the latent space, continuous control of image attributes is achieved, while the identity characteristics of the image and the photo - realism are maintained.