Abstract:Controllable semantic image editing enables a user to change entire image attributes with a few clicks, e.g., gradually making a summer scene look like it was taken in winter. Classic approaches for this task use a Generative Adversarial Net (GAN) to learn a latent space and suitable latent-space transformations. However, current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. To address these concerns, we learn multiple attribute transformations simultaneously, integrate attribute regression into the training of transformation functions, and apply a content loss and an adversarial loss that encourages the maintenance of image identity and photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work, which primarily focuses on qualitative evaluation. Our model permits better control for both single- and multiple-attribute editing while preserving image identity and realism during transformation. We provide empirical results for both natural and synthetic images, highlighting that our model achieves state-of-the-art performance for targeted image manipulation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **controllable semantic image editing**. Specifically, the author focuses on how to achieve continuous control of image attributes by navigating in the latent space of the Generative Adversarial Network (GAN), while maintaining the identity characteristics of the image and the photo - realism. Existing methods usually have problems such as attribute - editing entanglement, global image - identity change, and a decline in photo - realism. To overcome these problems, this paper proposes a new framework that can simultaneously learn multiple attribute transformations and integrate attribute regression when training the transformation function, while applying content loss and adversarial loss to encourage the maintenance of image identity and photo - realism. In addition, this paper also proposes a quantitative evaluation strategy for measuring controllable editing performance, which is different from previous work that mainly relied on qualitative evaluation. ### Main contributions: 1. **Simultaneous multi - attribute editing**: Through the joint sampling strategy, multiple attributes are edited simultaneously, improving the flexibility of editing. 2. **Attribute decoupling**: By integrating a regressor to predict the attributes of the image, the attribute editing is more precise and the entanglement between attributes is avoided. 3. **Maintaining image identity and photo - realism**: The perceptual loss and the adversarial loss are introduced to ensure that the identity characteristics of the image and the photo - realism are maintained during the editing process. 4. **Quantitative evaluation**: A quantitative evaluation method is proposed to measure the controllable editing performance, making up for the deficiency that previous work mainly relied on qualitative evaluation. ### Method overview: - **Problem definition**: Given a fixed GAN model, including a generator \( G \) and a discriminator \( D \), the input is a latent vector \( z\in\mathbb{R}^m \). The goal is to discover semantically meaningful directions \( T = \{d_1,\ldots,d_N\} \) in the latent space, through which the attributes of the synthesized image \( G(z) \) can be manipulated. - **Objective function**: Minimize the weighted objective function \( \mathcal{L}=\lambda_1\mathcal{L}_{\text{reg}}+\lambda_2\mathcal{L}_{\text{disc}}+\lambda_3\mathcal{L}_{\text{content}} \), where: - **Regression loss** \( \mathcal{L}_{\text{reg}} \): Evaluate whether \( T \) has performed the transformation indicated by \( \epsilon \). - **Adversarial loss** \( \mathcal{L}_{\text{disc}} \): Use the discriminator \( D \) to measure the quality of the generated image. - **Content loss** \( \mathcal{L}_{\text{content}} \): Designed to estimate the distance between two images to maintain the image identity. ### Experimental results: - **Natural scene data set**: Experiments were carried out on the Transient Attribute Database and the MIT Places2 data set, and the results show that the method in this paper performs better in editing image details while maintaining the identity characteristics of the image. - **Face data set**: Experiments were carried out on the FFHQ and CelebA - HQ data sets, and the results show that the method in this paper has better controllability and image - identity - maintaining ability when editing multiple attributes. In conclusion, this paper proposes an effective method for controllable semantic image editing. By navigating in the latent space, continuous control of image attributes is achieved, while the identity characteristics of the image and the photo - realism are maintained.

Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

EditGAN: High-Precision Semantic Image Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

Delta-GAN-Encoder: Encoding Semantic Changes for Explicit Image Editing, using Few Synthetic Samples

Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner

Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model

Self-Conditioned Generative Adversarial Networks for Image Editing

Self-Conditioned GANs for Image Editing

Designing an encoder for StyleGAN image manipulation

Rewriting Geometric Rules of a GAN

Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously

ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing

The GAN that Warped: Semantic Attribute Editing with Unpaired Data

PIE: Portrait Image Embedding for Semantic Control

FEditNet++: Few-Shot Editing of Latent Semantics in GAN Spaces with Correlated Attribute Disentanglement

Sequential Attention GAN for Interactive Image Editing

FEAT: Face Editing with Attention

Controllable Multi-Attribute Editing of High-Resolution Face Images

Nonlinear hierarchical editing: A powerful framework for face editing

Editing Out-of-domain GAN Inversion via Differential Activations

Interpreting the Latent Space of GANs for Semantic Face Editing