Abstract:Visual contents, such as movies, animations, computer games, videos and photos, are massively produced and consumed nowadays. Most of these contents are the combination of materials captured from real-world and contents synthesized by computers. Particularly, computer-generated visual contents are increasingly indispensable in modern entertainment and production. The generation of visual contents by computers is typically conditioned on real-world materials, driven by the imagination of designers and artists, or a combination of both. However, creating visual contents manually are both challenging and labor intensive. Therefore, enabling computers to automatically or semi-automatically synthesize needed visual contents becomes essential. Among all these efforts, a stream of research is to generate novel images based on given image priors, e.g., photos and sketches. This research direction is known as image-conditional image generation, which covers a wide range of topics such as image stylization, image completion, image fusion, sketch-to-image generation, and extracting image label maps. In this thesis, a set of novel approaches for image-conditional image generation are presented. The thesis starts with an exemplar-based method for facial image stylization in Chapter 2. This method involves a unified framework for facial image stylization based on a single style exemplar. A two-phase procedure is employed, where the first phase searches a dense and semantic-aware correspondence between the input and the exemplar images, and the second phase conducts edge-preserving texture transfer. While this algorithm has the merit of requiring only a single exemplar, it is constrained to face photos. To perform generalized image-to-image translation, Chapter 3 presents a data-driven and learning-based method. Inspired by the dual learning paradigm designed for natural language translation [115], a novel dual Generative Adversarial Network (DualGAN) mechanism is developed, which enables image translators to be trained from two sets of unlabeled images from two domains. This is followed by another data-driven method in Chapter 4, which learns multiscale manifolds from a set of images and then enables synthesizing novel images that mimic the appearance of the target image dataset. The method is named as Branched Generative Adversarial Network (BranchGAN) and employs a novel training method that enables unconditioned generative adversarial networks (GANs) to learn image manifolds at multiple scales. As a result, we can directly manipulate and even combine latent manifold codes that are associated with specific feature scales. Finally, to provide users more control over image generation results, Chapter 5 discusses an upgraded version of iGAN [126] (iGANHD) that significantly improves the art of manipulating high-resolution images through utilizing the multi-scale manifold learned with BranchGAN.

Appearance and Shape Based Image Synthesis by Conditional Variational Generative Adversarial Network

Statistics Enhancement Generative Adversarial Networks for Diverse Conditional Image Synthesis

Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs

Conditional Face Synthesis for Data Augmentation.

CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training

Pose- and Attribute-consistent Person Image Synthesis

MVFSIGM: multi-variant feature-based synthesis image generation model for improved stability using generative adversarial network

Person Image Synthesis Through Siamese Generative Adversarial Network

Spatially Constrained GAN for Face and Fashion Synthesis.

Multimodal Face Synthesis From Visual Attributes

Photo-realistic Image Synthesis from Lines and Appearance with Modular Modulation

Conditional Image Synthesis With Auxiliary Classifier GANs

Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN

Vae/Wgan-Based Image Representation Learning for Pose-Preserving Seamless Identity Replacement in Facial Images

Facial Synthesis from Visual Attributes via Sketch using Multi-Scale Generators

LATENT VECTOR PROTOTYPES GUIDED CONDITIONAL FACE SYNTHESIS

From Rule-Based to Learning-Based Image-Conditional Image Generation

CoGAN: Cooperatively Trained Conditional and Unconditional GAN for Person Image Generation.

3D-aware Image Generation and Editing with Multi-modal Conditions

HumanGAN: A Generative Model of Humans Images

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis