Abstract:Face Forgery Detection (FFD), or Deepfake detection, aims to determine whether a digital face is real or fake. Due to different face synthesis algorithms with diverse forgery patterns, FFD models often overfit specific patterns in training datasets, resulting in poor generalization to other unseen forgeries. This severe challenge requires FFD models to possess strong capabilities in representing complex facial features and extracting subtle forgery cues. Although previous FFD models directly employ existing backbones to represent and extract facial forgery cues, the critical role of backbones is often overlooked, particularly as their knowledge and capabilities are insufficient to address FFD challenges, inevitably limiting generalization. Therefore, it is essential to integrate the backbone pre-training configurations and seek practical solutions by revisiting the complete FFD workflow, from backbone pre-training and fine-tuning to inference of discriminant results. Specifically, we analyze the crucial contributions of backbones with different configurations in FFD task and propose leveraging the ViT network with self-supervised learning on real-face datasets to pre-train a backbone, equipping it with superior facial representation capabilities. We then build a competitive backbone fine-tuning framework that strengthens the backbone's ability to extract diverse forgery cues within a competitive learning mechanism. Moreover, we devise a threshold optimization mechanism that utilizes prediction confidence to improve the inference reliability. Comprehensive experiments demonstrate that our FFD model with the elaborate backbone achieves excellent performance in FFD and extra face-related tasks, i.e., presentation attack detection. Code and models are available at <a class="link-external link-https" href="https://github.com/zhenglab/FFDBackbone" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address key challenges in Face Forgery Detection (FFD) or Deepfake detection. Specifically, the paper focuses on the following issues: 1. **Overfitting to Specific Forgery Patterns**: Due to the diverse forgery patterns generated by different facial synthesis algorithms, existing FFD models often overfit to specific forgery patterns in the training dataset, resulting in poor generalization to unseen forgery data. 2. **Complex Facial Feature Representation and Subtle Forgery Clue Extraction**: The FFD task requires models to have strong capabilities to represent complex facial features and extract subtle forgery clues. However, existing FFD models typically use existing backbone networks directly, whose knowledge and capabilities are insufficient to meet the challenges of FFD, thus limiting their generalization ability. 3. **Neglect of Backbone Network Importance**: Although backbone networks play a crucial role in the FFD task, their importance is often overlooked, especially in terms of pre-training configurations. To address these issues, the paper proposes a comprehensive re-examination of the FFD workflow, from backbone network pre-training and fine-tuning to final inference of discriminative results. The specific contributions include: 1. **Proposing an Improved FFD Pipeline**: By re-examining the complete workflow from backbone network pre-training to fine-tuning and inference, a new FFD pipeline is proposed. 2. **Developing and Pre-training Efficient Backbone Networks**: Designing a backbone network that can better represent facial component features, specifically for the FFD task. 3. **Building a Competitive Backbone Network Fine-tuning Framework**: Enhancing the backbone network's ability to extract diverse forgery clues. 4. **Designing a Threshold Optimization Mechanism**: Improving the accuracy and reliability of the backbone network in inference results by integrating prediction probabilities and confidence. 5. **Extensive Experimental Validation**: Demonstrating the superiority and potential of the proposed backbone network in the FFD task and other related tasks (such as presentation attack detection) through extensive experiments. Through these methods and mechanisms, the paper aims to improve the generalization ability and practical application effectiveness of FFD models.

Face Forgery Detection with Elaborate Backbone