Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis
Jun Peng,Yiyi Zhou,Xiaoshuai Sun,Liujuan Cao,Yongjian Wu,Feiyue Huang,Rongrong Ji
DOI: https://doi.org/10.1109/tmm.2021.3116416
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:Text-to-Image (T2I) synthesis is a challenging task that aims to convert natural language descriptions to real images. It remains an open problem mainly due to the diversity of text descriptions, which poses a huge obstacle in generating vivid and relevant images. Moreover, the existing evaluation metrics in T2I synthesis are mainly used to evaluate the visual quality of the generated images, while the semantic consistency between the two modalities is often ignored. To address these issues, we present a novel Knowledge-Driven Generative Adversarial Network , termed KD-GAN, and a new evaluation system, named Pseudo Turing Test (PTT for short). Concretely, KD-GAN takes a further step in imitating the behavior of human painting, i.e. , drawing an image according to reference knowledge. The introduction of reference knowledge in KD-GAN not only improves the quality of the generated images but also enhances the semantic consistency between them and the input texts. In addition, KD-GAN can also greatly avoid some flaws against common sense during image generation, e.g. , skiing in the blue sky. The proposed PTT is an important supplement to the existing evaluation system of T2I synthesis. It includes a set of pseudo-experts of different multimedia tasks to evaluate the semantic consistency between the given texts and the generated images. To validate the proposed KD-GAN, we conducted extensive experiments on two benchmark datasets, i.e. , Caltech-UCSD Birds (CUB), and MS-COCO (COCO). The experimental results demonstrate that KD-GAN outperforms state-of-the-art methods on IS, FID, and the proposed PTT metrics. 1 1The codes of KD-GAN are at [Online]. Available: https://github.com/pengjunn/KD-GAN and the codes and models of PTT are at [Online]. Available: https://github.com/pengjunn/PTT.