MGF-GAN: Multi Granularity Text Feature Fusion for Text-guided-Image Synthesis

Xingfu Wang,Xiangyu Li,Ammar Hawbani,Liang Zhao,Saeed Hamood Alsamhi
DOI: https://doi.org/10.1109/trustcom56396.2022.00197
2022-01-01
Abstract:We have made research achievements worth sharing on the complicated topic of text-to-image synthesis. Our analysis of popular articles shows that they often use stacked structures to construct and generate confrontation network models and usually introduce multiple sets of generators and discriminator pairs. The entanglement between different generators affects the quality of the final synthesized image. Some researchers have proposed a single-stage network model to avoid traps between multiple generators, But it lacks the use of unstructured natural language information with different granularity. To correct this serious defect, we propose a multi-granularity feature network MGF-GAN, which plays the role of text information with different granularity based on the advantages of the single-stage network. Specifically, we input the three granularity features of the text, including sentences, aspect words, and single words of text, into different stages of the model through spatial attention and channel attention mechanisms to gradually refine the synthetic image from global and local perspectives. In addition, we reconstruct the loss function based on the contrast concept to stabilize the training and ensure that the visual meaning between the synthesized image and the natural language is consistent. We conducted validity experiments on CUB bird and COCO. The significant effect is sufficient to prove the effectiveness and advancement of our MGF-GAN.
What problem does this paper attempt to address?