Text-driven Face Image Generation and Manipulation via Multi-level Residual Mapper

LI Zong-Lin,ZHANG Sheng-Ping,LIU Yang,ZHANG Zhao-Xin,ZHANG Wei-Gang,HUANG Qing-Ming
DOI: https://doi.org/10.21655/ijsi.1673-7288.00313
2023-01-01
Journal of Software
Abstract:PDF HTML XML Export Cite reminder Text-driven Face Image Generation and Manipulation via Multi-level Residual Mapper DOI: 10.21655/ijsi.1673-7288.00313 Author: Affiliation: Clc Number: Fund Project: Article | Figures | Metrics | Reference | Related | Cited by | Materials | Comments Abstract:Although Generative Adversarial Networks (GANs) have achieved great success in face image generation and manipulation, discovering meaningful directions in the latent space of GANs to manipulate semantic attributes is a difficult but meaningful challenge in computer vision. The realization of this challenge typically requires large amounts of labeled data and several hours of network fine-tuning. However, obtaining an annotated collection of images for each desired manipulation is usually very expensive and time-consuming. Recent works aim to overcome this limitation by leveraging the pre-trained models. While they are promising, the accuracy of the manipulation and the authenticity of the results cannot meet the needs of real face editing scenarios. To address these problems, this paper encodes the image and text description into a shared embedding space and proposes a unified image generation and manipulation framework by leveraging the powerful joint representation capability from Contrastive Language-Image Pre-training (CLIP). With carefully designed network structures and loss functions, the proposed framework can learn a latent residual mapper network to map the input conditions into corresponding latent code residuals. This scheme enables the proposed method to perform high-quality face image generation and manipulation by leveraging the generative power from the pre-trained StyleGAN2 model. Extensive experiments demonstrate the superiority of the proposed approach in terms of manipulation accuracy, visual realism, and irrelevant attribute preservation. Reference Related Cited by
What problem does this paper attempt to address?