Hybrid Attention Driven Text-To-Image Synthesis Via Generative Adversarial Networks

Qingrong Cheng, Xiaodong Gu
DOI: https://doi.org/10.1007/978-3-030-30493-5_47
2019-01-01
Abstract:With the development of generative models, image synthesis conditioned on the specific variable becomes an important research theme gradually. This paper presents a novel spectral normalization based Hybrid Attentional Generative Adversarial Networks (HAGAN) for text to image synthesis. The hybrid attentional mechanism is composed of text-image cross-modal attention and self-attention of image sub regions. Cross-modal attention mechanism contributes to synthesize more fine-grained and text-related image by introducing word-level semantic information in generative model. The self-attention solves the long distance reliance of image local-region features when generate image. With spectral normalization, the training of GANs become more stable than traditional GANs, which conduces to avoid model collapse and gradient vanishing or explosion. We conduct experiments on widely used Oxford-102 flower dataset and CUB bird dataset to validate our proposed method. During quantitative and non-quantitative experimental comparison, the results indicate that the proposed method achieves the best performance on Inception score (IS), Frechet Inception Distance (FID) and visual effect.
What problem does this paper attempt to address?