Abstract:With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a significant challenge. To address this, we have constructed a large-scale evaluation benchmark called DeepFaceGen, aimed at quantitatively assessing the effectiveness of face forgery detection and facilitating the iterative development of forgery detection technology. DeepFaceGen consists of 776,990 real face image/video samples and 773,812 face forgery image/video samples, generated using 34 mainstream face generation techniques. During the construction process, we carefully consider important factors such as content diversity, fairness across ethnicities, and availability of comprehensive labels, in order to ensure the versatility and convenience of DeepFaceGen. Subsequently, DeepFaceGen is employed in this study to evaluate and analyze the performance of 13 mainstream face forgery detection techniques from various perspectives. Through extensive experimental analysis, we derive significant findings and propose potential directions for future research. The code and dataset for DeepFaceGen are available at <a class="link-external link-https" href="https://github.com/HengruiLou/DeepFaceGen" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the deficiencies in the current benchmark datasets used to evaluate face forgery detection techniques. Specifically, the existing datasets have limitations in generation methods, content diversity, ethnic fairness, and label richness, and are unable to comprehensively evaluate the effectiveness and generalization ability of face forgery detection techniques. To address these issues, the author constructed a large - scale general - purpose evaluation benchmark named DeepFaceGen, aiming to quantitatively evaluate face forgery detection techniques and promote their iterative development. ### Core Problems of the Paper 1. **Evaluation Effectiveness**: The existing face forgery detection techniques lack a large - scale, diverse dataset for effective evaluation. 2. **Generalization Ability**: The limitations of the existing datasets make it difficult to evaluate the generalization ability of detection techniques in different generation methods and scenarios. 3. **Diversity and Fairness**: The existing datasets have insufficient content diversity in terms of race, gender, age, etc., and lack comprehensive coverage of different generation methods. 4. **Label Richness**: The labels of the existing datasets are not detailed enough to support multi - angle and multi - level evaluations. ### Solutions of DeepFaceGen - **Large - scale Dataset**: DeepFaceGen contains 776,990 real - face image / video samples and 773,812 fake - face image / video samples, using 34 mainstream face - generation techniques. - **Diversity and Fairness**: During the construction process, the author considered factors such as content diversity and ethnic fairness to ensure the wide applicability and representativeness of the dataset. - **Detailed Labels**: It provides rich label information to support multi - angle and multi - level evaluations. - **Comprehensive Evaluation**: Through extensive experimental analysis, the performance of 13 mainstream face - forgery detection techniques was evaluated, and in - depth analysis was carried out from multiple dimensions (such as generation methods, frameworks, generalization ability, etc.). ### Main Contributions 1. **Constructed a Large - scale Face - forgery Evaluation Benchmark**: DeepFaceGen is one of the most comprehensive face - forgery evaluation datasets at present, covering multiple generation techniques and rich label information. 2. **Promoted the Development of Face - forgery Detection Techniques**: Through detailed evaluation and analysis, it revealed the advantages and disadvantages of existing techniques, providing valuable guidance for future research. 3. **Promoted the Research on Fairness and Diversity**: It emphasized the importance of considering factors such as race, gender, and age in the dataset, ensuring the wide representativeness and fairness of the dataset. In conclusion, this paper solved the deficiencies of the existing face - forgery detection evaluation benchmarks by constructing DeepFaceGen, providing a solid foundation for further research and development in this field.

A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning

Face Forensics in the Wild

A survey on face forgery detection of Deepfake

Unified Video and Image Representation for Boosted Video Face Forgery Detection

Deepfake Generation and Detection: A Benchmark and Survey

Deep Learning Technology for Face Forgery Detection: A Survey

Common Forgery Artifact Driven Deepfake Face Detection

Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics

Low-complexity Fake Face Detection Based on Forensic Similarity

DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis

DeeperForensics Challenge 2020 on Real-World Face Forgery Detection: Methods and Results

Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method

Voice-Face Homogeneity Tells Deepfake

Unmasking DeepFakes with simple Features

Exploring Bi-Level Inconsistency Via Blended Images for Generalizable Face Forgery Detection

Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer

OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild

A War Beyond Deepfake: Benchmarking Facial Counterfeits and Countermeasures

Comparison of Deepfake Detection Techniques through Deep Learning