Generative Adversarial Networks for text-to-face synthesis & generation: A quantitative–qualitative analysis of Natural Language Processing encoders for Spanish

Eduardo Yauri-Lozano,Manuel Castillo-Cara,Luis Orozco-Barbosa,Raúl García-Castro
DOI: https://doi.org/10.1016/j.ipm.2024.103667
IF: 7.466
2024-05-01
Information Processing & Management
Abstract:In recent years, the development of Natural Language Processing (NLP) text-to-face encoders and Generative Adversarial Networks (GANs) has enabled the synthesis and generation of facial images from textual description. However, most encoders have been developed for the English language. This work presents the first study of three text-to-face encoders, namely, the RoBERTa pre-trained model and the Sent2Vec and RoBERTa models, trained with the CelebA dataset in Spanish. It then introduces customised and fine-tuned conditional Deep Convolutional Generative Adversarial Networks (cDCGANs) trained with the CelebA dataset for text-to-face generation in Spanish. To validate the results obtained, a qualitative evaluation was carried out with a visual analysis and a quantitative evaluation based on the IS, FID and LPIPS metrics. Our findings show promising results with respect to the literature, improving the numerical metrics of FID and LPIPS by 5% and 37%, respectively. Our results also show, through a quantitative–qualitative comparison of the cDCGAN training epochs, that the IS metric is not a reliable objective metric to be considered in the evaluation of similar works.
computer science, information systems,information science & library science
What problem does this paper attempt to address?