Abstract:Taking full advantage of the excellent performance of StyleGAN, style transfer-based face swapping methods have been extensively investigated recently. However, these studies require separate face segmentation and blending modules for successful face swapping, and the fixed selection of the manipulated latent code in these works is reckless, thus degrading face swapping quality, generalizability, and practicability. This paper proposes a novel and end-to-end integrated framework for high resolution and attribute preservation face swapping via Adaptive Latent Representation Learning. Specifically, we first design a multi-task dual-space face encoder by sharing the underlying feature extraction network to simultaneously complete the facial region perception and face encoding. This encoder enables us to control the face pose and attribute individually, thus enhancing the face swapping quality. Next, we propose an adaptive latent codes swapping module to adaptively learn the mapping between the facial attributes and the latent codes and select effective latent codes for improved retention of facial attributes. Finally, the initial face swapping image generated by StyleGAN2 is blended with the facial region mask generated by our encoder to address the background blur problem. Our framework integrating facial perceiving and blending into the end-to-end training and testing process can achieve high realistic face-swapping on wild faces without segmentation masks. Experimental results demonstrate the superior performance of our approach over state-of-the-art methods.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address several key issues in existing face swapping methods: 1. **High Quality and Generalization**: Existing style transfer-based face swapping methods, while performing well in visual effects, usually require separate face segmentation and fusion modules, which increases complexity and computational cost. Additionally, these methods often have fixed latent code selection for manipulation, leading to decreased face swapping quality, generalization, and practicality. 2. **Naturalness and Background Blur**: Maintaining naturalness and the clarity of the target image background when swapping the content of two faces is a challenge. Existing methods either produce noticeable artifacts or face resolution mismatch issues in the facial region. 3. **Multi-stage Training**: Many existing face swapping methods require multi-stage training schemes, including generating segmentation masks and fusion modules, which are not only complex but also computationally expensive. These methods fail to fully utilize the learning process of segmentation masks, which could be beneficial for face swapping. 4. **Latent Code Selection and Swapping**: Existing methods often have fixed latent code selection and swapping, which leads to some facial attributes not being well learned and swapped, thus affecting attribute retention. To address these issues, the paper proposes an end-to-end integrated framework that achieves high-resolution and attribute-preserving face swapping through Adaptive Latent Representation Learning (ALL). Specifically, the framework includes the following components: - **Multi-task Dual-space Encoder (MDE)**: Used to perceive the face swapping area and generate segmentation masks while mapping facial images to the face pose space and attribute space. - **Adaptive Latent Codes Swapping Module (ALS)**: Used to adaptively select and swap effective latent codes to enhance the retention of facial attributes. - **Internal Face Fusion Module**: Used to seamlessly connect the swapped face region with the background of the target image. Through these components, the proposed framework can achieve high-quality, highly realistic face swapping without segmentation masks, and has good generalization and practicality. Experimental results show that this method outperforms existing state-of-the-art methods on multiple datasets.

End-to-end Face-swapping via Adaptive Latent Representation Learning

Region-Aware Face Swapping

High-Fidelity Face Swapping with Style Blending

FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment

StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping

ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality Face Swapping

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

High-resolution Face Swapping via Latent Semantics Disentanglement

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer

Fine-Grained Face Swapping via Regional GAN Inversion

Designing One Unified Framework for High-Fidelity Face Reenactment and Swapping

An Efficient Attribute-Preserving Framework for Face Swapping

E4S: Fine-grained Face Swapping via Editing With Regional GAN Inversion

ControlFace: Feature Disentangling for Controllable Face Swapping.

LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping

StableSwap: Stable Face Swapping in a Shared and Controllable Latent Space

FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping

ShapeEditor: A StyleGAN Encoder for Stable and High Fidelity Face Swapping

ShapeEditer: a StyleGAN Encoder for Face Swapping

Enriching Facial Anti-Spoofing Datasets via an Effective Face Swapping Framework

Identity-Preserving Face Swapping via Dual Surrogate Generative Models