Progressive Text-to-Face Synthesis with Generative Adversarial Network

Xing Qiao,Yanghong Han,Yan Wu,Zili Zhang
DOI: https://doi.org/10.1109/fg52635.2021.9667004
2021-01-01
Abstract:Text-to-Face synthesis has considerable challenges and potentials in the field of public safety. Compared with the Text-to-Image synthesis models, the text descriptions of facial features are more complex and diverse. For the text embedding, most of the previous Text-to-Face synthesis models only deal with a single sentence containing several features of face images, and the generated images are vague and lack of details. In this paper, a novel Progressive Text-to-Face synthesis with Generative Adversarial Network (PFGAN) is proposed to generate natural face images from text descriptions. Firstly, a new text encoding method Convolution-Deconvolution Word Embedding LSTM (CDWE-BLSTM) is leveraged as the text encoder, which tackles more complex sentences and improves the accuracy of text encoding. Secondly, the PFGAN is composed of multiple generators and discriminators arranged in a tree-like structure. Furthermore, face images at multiple scales are progressively generated from different branches of the tree, corresponding to the same descriptions. images at multiple scales corresponding to the same scene are generated from different branches of the tree. By comparing with three existing Text-to-Face synthesis methods, extensive experiments demonstrate that the proposed PFGAN is very competitive in the IS (Inception Scores), FID (Frechet Inception Distance) and resolution of the generated face images.
What problem does this paper attempt to address?