End-to-end Face-swapping via Adaptive Latent Representation Learning

Chenhao Lin,Pengbin Hu,Chao Shen,Qian Li
2023-03-08
Abstract:Taking full advantage of the excellent performance of StyleGAN, style transfer-based face swapping methods have been extensively investigated recently. However, these studies require separate face segmentation and blending modules for successful face swapping, and the fixed selection of the manipulated latent code in these works is reckless, thus degrading face swapping quality, generalizability, and practicability. This paper proposes a novel and end-to-end integrated framework for high resolution and attribute preservation face swapping via Adaptive Latent Representation Learning. Specifically, we first design a multi-task dual-space face encoder by sharing the underlying feature extraction network to simultaneously complete the facial region perception and face encoding. This encoder enables us to control the face pose and attribute individually, thus enhancing the face swapping quality. Next, we propose an adaptive latent codes swapping module to adaptively learn the mapping between the facial attributes and the latent codes and select effective latent codes for improved retention of facial attributes. Finally, the initial face swapping image generated by StyleGAN2 is blended with the facial region mask generated by our encoder to address the background blur problem. Our framework integrating facial perceiving and blending into the end-to-end training and testing process can achieve high realistic face-swapping on wild faces without segmentation masks. Experimental results demonstrate the superior performance of our approach over state-of-the-art methods.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address several key issues in existing face swapping methods: 1. **High Quality and Generalization**: Existing style transfer-based face swapping methods, while performing well in visual effects, usually require separate face segmentation and fusion modules, which increases complexity and computational cost. Additionally, these methods often have fixed latent code selection for manipulation, leading to decreased face swapping quality, generalization, and practicality. 2. **Naturalness and Background Blur**: Maintaining naturalness and the clarity of the target image background when swapping the content of two faces is a challenge. Existing methods either produce noticeable artifacts or face resolution mismatch issues in the facial region. 3. **Multi-stage Training**: Many existing face swapping methods require multi-stage training schemes, including generating segmentation masks and fusion modules, which are not only complex but also computationally expensive. These methods fail to fully utilize the learning process of segmentation masks, which could be beneficial for face swapping. 4. **Latent Code Selection and Swapping**: Existing methods often have fixed latent code selection and swapping, which leads to some facial attributes not being well learned and swapped, thus affecting attribute retention. To address these issues, the paper proposes an end-to-end integrated framework that achieves high-resolution and attribute-preserving face swapping through Adaptive Latent Representation Learning (ALL). Specifically, the framework includes the following components: - **Multi-task Dual-space Encoder (MDE)**: Used to perceive the face swapping area and generate segmentation masks while mapping facial images to the face pose space and attribute space. - **Adaptive Latent Codes Swapping Module (ALS)**: Used to adaptively select and swap effective latent codes to enhance the retention of facial attributes. - **Internal Face Fusion Module**: Used to seamlessly connect the swapped face region with the background of the target image. Through these components, the proposed framework can achieve high-quality, highly realistic face swapping without segmentation masks, and has good generalization and practicality. Experimental results show that this method outperforms existing state-of-the-art methods on multiple datasets.