Text-to-image Generation Based on Spatial-Channel Attention and Semantic Redescription

Guangxi Chen,Qiaochu Li,Min Tang
DOI: https://doi.org/10.1109/tocs53301.2021.9689032
2021-01-01
Abstract:Generating images from a given natural language description is a challenging task, and its primary purpose is to generate images which are visually real and semantically consistent. In this article, we propose a text-to-image-to-text model (SymmetricGAN) based on the spatial-channel attention mechanism, which has two modules: the spatial-channel attention module (SCAM) and the image semantic redescription module (ISRM). The spatial-channel attention module separates different visual attributes, so that the model pays more attention to the sub-regions corresponding to the most relevant words, and generate target images from coarse to fine under the cascade architecture. The image semantic redescription module regenerates the description text of the image to ensure that the composite image is consistent with the original semantic description. We show the comparison and results on two general datasets.
What problem does this paper attempt to address?