CD-GAN: Commonsense-Driven Generative Adversarial Network with Hierarchical Refinement for Text-to-Image Synthesis

Guokai Zhang,Ning Xu,Chenggang Yan,Bolun Zheng,Yulong Duan,Bo Lv,An-An Liu
DOI: https://doi.org/10.34133/icomputing.0017
2023-01-01
Abstract:Synthesizing vivid images with descriptive texts is gradually emerging as a frontier cross-domain generation task. However, it is obviously inadequate to generate the high-quality image with one single sentence accurately due to the information asymmetry between modalities, which needs external knowledge to balance the process. Moreover, the limited description of the entities in the sentence cannot guarantee the semantic consistency between text and generated image, causing the deficiency of details in foreground and background. Here, we propose a commonsense-driven generative adversarial network to generate photo-realistic images depending on entity-related commonsense knowledge. Commonsense-driven generative adversarial network contains 2 key commonsense-based modules: (a) Entity semantic augment is designed to enhance entity semantics with common sense for abating the information asymmetry, and (b) adaptive entity refinement is used to generate the high-resolution image guided by various commonsense knowledges in multistage for keeping text-image consistency. We demonstrated extensive synthetic cases on the widely used CUB-birds (Caltech-UCSD Birds-200-2011) dataset, where our model achieves competitive results compared to the other state-of-the-art models.
What problem does this paper attempt to address?