Towards photorealistic face generation using text-guided Semantic-Spatial FaceGAN

DOI: https://doi.org/10.1007/s11042-024-19320-7
IF: 2.577
2024-05-16
Multimedia Tools and Applications
Abstract:In this paper, we propose a simple yet effective Text-To-Face (T2F) generative adversarial network named Semantic-Spatial FaceGAN, which addresses the challenge of generating facial images from natural language descriptions. Natural language is inherently abstract, whereas images are concrete. This discrepancy poses a significant challenge, especially when utilizing multiple descriptions to generate accurate images. To overcome this issue, we introduce the Semantic Spatial FaceGAN (SS-FaceGAN) network, capable of generating precise features from multiple descriptions. Additionally, we incorporate a novel Focus Spatial (FS) module that predicts masks based on text semantics to refine image feature mapping. We also introduce an attention mechanism, the Word Attention Reuse (WAR) module, which leverages the potential distribution of each word in the description to compute word-level attention. Finally, our experiments demonstrate the effectiveness of our approach.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?