Dualattn-GAN: Text to Image Synthesis with Dual Attentional Generative Adversarial Network.

Yali Cai,Xiaoru Wang,Zhihong Yu,Fu Li,Peirong Xu,Yueli Li,Lixian Li
DOI: https://doi.org/10.1109/access.2019.2958864
IF: 3.9
2019-01-01
IEEE Access
Abstract:Recent generative adversarial network based methods have shown promising results for the charming but challenging task of synthesizing images from text descriptions. These approaches can generate images with general shape and color but often produce distorted global structures with unnatural local semantic details. It is due to ineffectiveness of convolutional neural networks in capturing the high-level semantic information for pixel-level image synthesis. In this paper, we propose a Dual Attentional Generative Adversarial Network (DualAttn-GAN) in which the dual attention modules are introduced to enhance local details and global structures by attending to related features from relevant words and different visual regions. As one of the dual modules, the textual attention module is designed to explore the fine-grained interaction between vision and language. On the other hand, visual attention module models internal representations of vision from channel and spatial axes, which can better capture the global structures. Meanwhile, we apply an attention embedding module to merge multi-path features. Furthermore, we present an inverted residual structure to boost representation power of CNNs and apply spectral normalization to stabilize GAN training. With extensive experimental validation on two benchmark datasets, our method significantly improves state-of-the-art models over the evaluation metrics of inception score and Fréchet inception distance.
What problem does this paper attempt to address?