Face Transformer: Towards High Fidelity and Accurate Face Swapping

Kaiwen Cui,Rongliang Wu,Fangneng Zhan,Shijian Lu
2023-04-05
Abstract:Face swapping aims to generate swapped images that fuse the identity of source faces and the attributes of target faces. Most existing works address this challenging task through 3D modelling or generation using generative adversarial networks (GANs), but 3D modelling suffers from limited reconstruction accuracy and GANs often struggle in preserving subtle yet important identity details of source faces (e.g., skin colors, face features) and structural attributes of target faces (e.g., face shapes, facial expressions). This paper presents Face Transformer, a novel face swapping network that can accurately preserve source identities and target attributes simultaneously in the swapped face images. We introduce a transformer network for the face swapping task, which learns high-quality semantic-aware correspondence between source and target faces and maps identity features of source faces to the corresponding region in target faces. The high-quality semantic-aware correspondence enables smooth and accurate transfer of source identity information with minimal modification of target shapes and expressions. In addition, our Face Transformer incorporates a multi-scale transformation mechanism for preserving the rich fine facial details. Extensive experiments show that our Face Transformer achieves superior face swapping performance qualitatively and quantitatively.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem this paper attempts to address is how to accurately blend the attribute features of the target face (such as head pose, facial expressions, etc.) while preserving the identity features of the source face (such as skin color, facial features, etc.) in the task of face swapping. Existing methods, such as those based on 3D modeling and Generative Adversarial Networks (GANs), can achieve face swapping to a certain extent but have limitations in reconstruction accuracy and difficulty in retaining subtle yet important identity details of the source face. Therefore, this paper proposes a new model called Face Transformer, which aims to overcome these challenges by introducing a transformer architecture to achieve higher fidelity and accuracy in face swapping. Specifically, Face Transformer constructs a semantically aware correspondence between the source face and the target face, allowing for smooth and accurate mapping of the source face's identity features to the corresponding regions of the target face while minimizing modifications to the target face's shape and expressions. Additionally, the model employs a multi-scale feature transformation mechanism to retain rich facial details, thereby achieving superior face swapping performance compared to existing methods on multiple public datasets.