Abstract:Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo$\rightarrow$ sketch and artist painting style transfer. However, existing models can only be capable of transferring the low-level information (e.g. color or texture changes), but fail to edit high-level semantic meanings (e.g., geometric structure or content) of objects. On the other hand, while some researches can synthesize compelling real-world images given a class label or caption, they cannot condition on arbitrary shapes or structures, which largely limits their application scenarios and interpretive capability of model results. In this work, we focus on a more challenging semantic manipulation task, which aims to modify the semantic meaning of an object while preserving its own characteristics (e.g. viewpoints and shapes), such as cow$\rightarrow$sheep, motor$\rightarrow$ bicycle, cat$\rightarrow$dog. To tackle such large semantic changes, we introduce a contrasting GAN (contrast-GAN) with a novel adversarial contrasting objective. Instead of directly making the synthesized samples close to target data as previous GANs did, our adversarial contrasting objective optimizes over the distance comparisons between samples, that is, enforcing the manipulated data be semantically closer to the real data with target category than the input data. Equipped with the new contrasting objective, a novel mask-conditional contrast-GAN architecture is proposed to enable disentangle image background with object semantic changes. Experiments on several semantic manipulation tasks on ImageNet and MSCOCO dataset show considerable performance gain by our contrast-GAN over other conditional GANs. Quantitative results further demonstrate the superiority of our model on generating manipulated results with high visual fidelity and reasonable object semantics.

SIMGAN: Photo-Realistic Semantic Image Manipulation Using Generative Adversarial Networks.

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge.

Dual Attention GANs for Semantic Image Synthesis

Text-to-image Generation Based on Spatial-Channel Attention and Semantic Redescription

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Adversarial Pixel-Level Generation of Semantic Images

A Shared Representation for Photorealistic Driving Simulators

Generative Semantic Manipulation with Contrasting GAN

SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis

Semantic Image Synthesis via Adversarial Learning

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation

Semantic Image Synthesis with Unconditional Generator

Handwritten Digits Image Generation with help of Generative Adversarial Network: Machine Learning Approach

Face Sketch Synthesis via Semantic-Driven Generative Adversarial Network

Semantic Draw Engineering for Text-to-Image Creation

Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis

Local and Global GANs with Semantic-Aware Upsampling for Image Generation

Semantic prior guided fine-grained facial expression manipulation

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis