Text to image synthesis using multi-generator text conditioned generative adversarial networks.

Min Zhang,Chunye Li,Zhiping Zhou
DOI: https://doi.org/10.1007/s11042-020-09965-5
2021-01-01
Abstract:Recently, Generative Adversarial Network(GAN) has been the most mainstream technology in the task of Text to Image. However, the vanilla deep neural networks tend to approximate continuous mappings in real generation tasks rather than discontinuous mappings with discrete points. When training on datasets with multiple types, GAN fails to synthesize diverse images, which we call as mode collapse. To deal with it, we propose the Multi-generator Text Conditioned Generative Adversarial Network (MTC-GAN) in this paper. Textual description of real images is embedded on the noise vector as a constraint. Based on Deep Convolutional Generative Adversarial Networks(DCGAN), multiple generators are incorporated to capture high probability among the target distribution. To identify the generated fake sample from a particular generator, the discriminator must enforce multiple generators to have different identifiable modes. The method based on global constraints can make the generated images more diverse. Multiple generators can improve the particular functional shape of the discriminators indirectly, which should make the GAN more stable when trained in high dimensional spaces. The experimental results on the standard dataset demonstrate the good performance of the proposed method. The problem of mode collapse can be improved, and the generated samples can be more diverse.
What problem does this paper attempt to address?