Bridging The Domain Gap Arising from Text Description Differences for Stable Text-To-Image Generation.

Tian Tan,Weimin Tan,Xuhao Jiang,Yueming Jiang,Bo Yan
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447092
2024-01-01
Abstract:Generating high-quality images that conform to the semantics of captions has numerous potential applications. However, text-to-image generation is a challenging task due to its cross-modality nature. Current generative models are typically unstable, meaning that complex sentences can result in poor image quality. In this paper, we propose a novel model to bridge the domain gap arising from sentence complexity to achieve stable text-to-image generation. Our model includes two key modules, the attribute extraction module and the attribute fusion module. These modules can extract attributes from the captions and fuse them with image features to encourage the model to accurately understand the semantics. Our modules are plug-and-play and extensive experiments demonstrate that our approach outperforms the state-of-the-art GAN model. Our code and trained model are available at https://github.com/tantian21/stable-t2i-generation.
What problem does this paper attempt to address?