CRFAST: Clip-Based Reference-Guided Facial Image Semantic Transfer

Ailin Li,Lei Zhao,Zhiwen Zuo,Zhizhong Wang,Wei Xing,Dongming Lu
DOI: https://doi.org/10.1109/ICASSP49357.2023.10097108
2023-01-01
Abstract:This paper presents a new task for CLIP-based reference-guided facial image semantic transfer: the source facial image is translated to the output image with the high-level semantic attributes from the reference image while maintaining identity preservation. To this end, we employ the powerful generative capability of StyleGAN generator and the rich semantic knowledge of CLIP encoder to accomplish such a task. Additionally, a novel contrastive loss is designed to comprehensively explore the rich semantic information of CLIP for facial semantic concepts. This loss guides the semantic transfer toward desired directions from different perspectives in the pre-defined CLIP space. Besides, a simple yet effective semantic-preserved modulation module is proposed to explicitly map CLIP embeddings of reference image to the latent space. Experiments demonstrate that our approach achieves realistic facial image semantic transfer driven by reference images with various facial semantics.
What problem does this paper attempt to address?