ARRPNGAN: Text-to-image GAN with attention regularization and region proposal networks

Fengnan Quan,Bo Lang,Yanxi Liu
DOI: https://doi.org/10.1016/j.image.2022.116728
2022-04-01
Abstract:Although text-to-image synthesis has shown remarkable success in generating high-resolution photorealistic images and semantic consistency, it still faces challenges in generating images with complex backgrounds. In this paper, we address this problem by proposing a novel generative adversarial text-to-image synthesis framework based on attention regularization modules and region proposal networks (ARRPNGAN). ARRPNGAN can precisely locate the keywords in text by exploiting attention model advantages and improving the accuracy in locating the subimage of target objects with the help of an RPN. Leveraging both attention regularization and the RPN a generative adversarial network (GAN) can obtain the most text description semantics and reduce the interference of complex background information. The results of extensive experiments on the Caltech-UCSD Birds and MS COCO datasets demonstrate that the proposed ARRPNGAN significantly outperforms other state-of-the-art text-to-image methods, especially in generating photorealistic images with complex backgrounds. Codes are available at: https://github.com/quanFN/ARRPNGAN.
What problem does this paper attempt to address?