Text-guided Image Generation Based on Ternary Attention Mechanism Generative Adversarial Network

Jie Yang,Han Liu,Jing Xin,Youmin Zhang
DOI: https://doi.org/10.1109/isas61044.2024.10552390
2024-01-01
Abstract:Synthesizing high-quality photorealistic images from textual descriptions is a challenging mission. Existing text-to-image generative adversarial networks typically use a stacking structure as the core network, but still have drawbacks: the model network structure becomes increasingly complex and large, and the generated images are not natural and clear enough, looking like a simple combination of rough shapes and trace detail features, lacking fine-grained information, visual realism, and diversity to be further enhanced. To this end, we propose a simple and effective model for text-to-image synthesis - a ternary attention-based generative adversarial network, which uses a pair of generators and discriminators as the underlying structure, with the generators combining triple attention mechanisms to fuse fine-grained image features, and the discriminators combining match-aware gradient penalties and one-way output mechanisms for game training. Our proposed method effectively synthesizes real and text-matched images and achieves better performance on the widely used CUB and COCO datasets compared to the current state-of-the-art methods.
What problem does this paper attempt to address?