Text-to-Image Synthesis via Visual-Memory Creative Adversarial Network.

Shengyu Zhang,Hao Dong,Wei Hu,Yike Guo,Chao Wu,Di Xie,Fei Wu
DOI: https://doi.org/10.1007/978-3-030-00764-5_38
2018-01-01
Abstract:Despite recent advances, text-to-image generation on complex datasets like MSCOCO, where each image contains varied objects, is still a challenging task. In this paper, we propose a method named visual-memory Creative Adversarial Network (vmCAN) to generate images depending on their corresponding narrative sentences. vmCAN appropriately leverages an external visual knowledge memory in both multimodal fusion and image synthesis. By conditioning synthesis on both internally textual description and externally triggered "visual proposals", our method boosts the inception score of the baseline method by 17.6% on the challenging COCO dataset.
What problem does this paper attempt to address?