Abstract:Text-to-image (T2I) synthesis aims at generating photo-realistic images from text descriptions, which is a particularly important task in bridging vision and language. Each generated image consists of two parts: the content part related to the text and the style part irrelevant to the text. The existing discriminator does not distinguish between the content part and the style part. This not only precludes the T2I synthesis models from generating the content part effectively but also makes it difficult to manipulate the style of the generated image. In this paper, we propose a modality disentangled discriminator that distinguishes between the content part and the style part at a specific layer. Specifically, we enforce the early layers of a certain number in the discriminator to become the disentangled representation extractor through two losses. The extracted common representation for the content part can make the discriminator more effective for capturing the text-image correlation, while the extracted modality-specific representation for the style part can be directly transferred to other images. The combination of these two representations can also improve the quality of the generated images. Our proposed discriminator is used to substitute the discriminator of each stage in the representative model AttnGAN and the SOTA model DM-GAN. Extensive experiments are conducted on three widely used datasets, i.e. CUB, Oxford-102, and COCO, for the T2I synthesis task, demonstrating the superior performance of the modality disentangled discriminator over the base models. Code for DM-GAN with our modality disentangled discriminator is available at https://github.com/FangxiangFeng/DM-GAN-MDD.

A Discriminator Improves Unconditional Text Generation without Updating the Generator

Adding A Filter Based on The Discriminator to Improve Unconditional Text Generation

Adding a filter based on the discriminator to improve unconditional text generation

A discriminator improves unconditional text generation without updating the generator

Discriminator Modification in GAN for Text-to-Image Generation

Improving GANs with A Dynamic Discriminator

Which Discriminator for Cooperative Text Generation?

A Universal Discriminator for Zero-Shot Generalization

Discriminator contrastive divergence: Semi-amortized generative modeling by exploring energy of the discriminator

Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling by Exploring Energy of the Discriminator

Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation

Sequence Generative Adversarial Nets with a Conditional Discriminator

A novel hybrid augmented loss discriminator for text-to-image synthesis

An Attention-Based Approach to Accelerating Sequence Generative Adversarial Nets

A Multi-Player Minimax Game for Generative Adversarial Networks.

Text to Image Synthesis Based on Multiple Discrimination

The detection of distributional discrepancy for language GANs

Enhancing Text Generation with Cooperative Training

Modality Disentangled Discriminator for Text-to-Image Synthesis.

Adversarial Discrete Sequence Generation Without Explicit Neural Networks As Discriminators

Differentiated Distribution Recovery for Neural Text Generation