Abstract:Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we reconstruct the face image based on the synthesized sketch. The proposed Attribute2Sketch2Face framework, which is based on a combination of deep Conditional Variational Autoencoder (CVAE) and Generative Adversarial Networks (GANs), consists of three stages: (1) Synthesis of facial sketch from attributes using a CVAE architecture, (2) Enhancement of coarse sketches to produce sharper sketches using a GAN-based framework, and (3) Synthesis of face from sketch using another GAN-based network. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based three stage face synthesis method.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the automatic synthesis of facial images from visual attributes. Specifically, the paper proposes a three - stage framework based on Conditional Variational Auto - Encoder (CVAE) and Generative Adversarial Network (GAN), named Attribute2Sketch2Face (A2S2F), for generating high - quality facial images from given facial attributes. This process is divided into three stages: 1. **Attribute - to - Sketch (A2S)**: - Use the CVAE architecture to generate facial sketches from visual attributes. - The goal of this stage is to generate rough facial sketches from texture attributes and noise vectors. 2. **Sketch - to - Sketch (S2S)**: - Use a GAN - based framework to further enhance the rough sketches generated in the A2S stage to generate clearer sketches. - This stage uses the AUDeNet (Attribute - preserving Dense UNet) generator, which combines the advantages of UNet and DenseNet to improve the quality of sketches. 3. **Sketch - to - Face (S2F)**: - Use another GAN - based framework to generate facial images from the enhanced sketches. - The generator in this stage combines texture and color attributes to generate high - quality facial images. Through these three stages, the paper aims to generate high - quality facial images from given facial attributes, and this technology has broad application prospects in fields such as law enforcement and entertainment. For example, in the absence of a suspect's facial image, the suspect's facial image can be generated by describing the suspect's characteristics to assist in criminal investigations. ### Formula Summary - **Variational Lower Bound of CVAE**: \[ L_{\text{CVAE}}(x, y; \theta, \phi)=-\text{KL}(q_\phi(z | x, y) \| p_\theta(z))+\mathbb{E}_{z \sim q_\phi(z | x, y)}[\log p_\theta(x | y, z)] \] - **Objective Function of Conditional GAN**: \[ L_{\text{cGAN}}(G, D)=\mathbb{E}_{x, y \sim P_{\text{data}}(x, y)}[\log D(x, y)]+\mathbb{E}_{x \sim P_{\text{data}}(x), z \sim p_z(z)}[\log(1 - D(x, G(x, z)))] \] - **Loss Function of A2S Stage**: \[ L_{\text{A2S}} = L_{\text{CVAE}}(s, a; \phi, \theta)-\lambda \text{KL}(q_\beta(z | n, a) \| p_\theta(z)) \] - **Loss Function of S2S Stage**: \[ L = L_{\text{A}}+\lambda_1 L_1+\lambda_2 L_{\text{perp}} \] where: - \( L_{\text{A}} \) is the adversarial loss - \( L_1 \) is the loss based on the L1 - norm - \( L_{\text{perp}} \) is the perceptual loss - **Perceptual Loss**: \[ L_{\text{perp}}=\| V(s_g)-V(s) \|_1 \] where \( V \) represents the feature representation of a certain layer of the pre - trained VGG - 16 network. Through these methods and techniques, the paper successfully solves the problem of generating high - quality facial images from visual attributes and has been verified on multiple datasets, demonstrating its effectiveness and superiority.

Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs

Facial Synthesis from Visual Attributes via Sketch using Multi-Scale Generators

Attribute-Guided Sketch Generation

Joint Sketch-Attribute Learning for Fine-Grained Face Synthesis.

Multimodal Face Synthesis From Visual Attributes

Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs

Face Sketch Synthesis via Semantic-Driven Generative Adversarial Network

Composition-Aided Face Photo-Sketch Synthesis.

Toward Realistic Face Photo–Sketch Synthesis via Composition-Aided GANs

Recognizing Facial Sketches by Generating Photorealistic Faces Guided by Descriptive Attributes

Quality Guided Sketch-to-Photo Image Synthesis

Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network with Graph Representation Learning

Fine-Granularity Face Sketch Synthesis

HCGAN: hierarchical contrast generative adversarial network for unpaired sketch face synthesis

Appearance and Shape Based Image Synthesis by Conditional Variational Generative Adversarial Network

Deep Neural Representation Guided Face Sketch Synthesis

Recognizing Minimal Facial Sketch by Generating Photorealistic Faces with the Guidance of Descriptive Attributes

High-Quality Facial Photo-Sketch Synthesis Using Multi-Adversarial Networks

Attribute-controlled Face Photo Synthesis from Simple Line Drawing.

High-Quality Synthesized Face Sketch Using Generative Reference Prior

Deep Generation of Face Images from Sketches