DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis

Bing Yang,Xueqin Xiang,Wangzeng Kong,Jianhai Zhang,Yong Peng
DOI: https://doi.org/10.1109/tmm.2024.3358086
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Text-to-image synthesis aims to generate highquality realistic images conditioned on text description. The great challenge of this task depends on deeply and seamlessly integrating image and text information. Thus, in this paper, we propose a deep multimodal fusion generative adversarial networks (DMF-GAN) that allows effective semantic interactions for finegrained text-to-image generation. Specifically, through a novel recurrent semantic fusion network, DMF-GAN could consistently manipulate global assignment of text information among isolated fusion blocks. With the assistance of a multi-head attention module, DMF-GAN could model word information from different perspectives and further improve the semantic consistency. In addition, a word-level discriminator is proposed to provide the generator with fine-grained feedback related to each word. Compared with current state-of-the-art methods, our proposed DMFGAN could efficiently synthesize realistic and text-alignment images and achieve better performance on challenging benchmarks. The code link: https://github.com/xueqinxiang/DMF-GAN
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?