Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method

Mian Zou,Baosheng Yu,Yibing Zhan,Siwei Lyu,Kede Ma
2024-05-14
Abstract:In recent years, deep learning has greatly streamlined the process of generating realistic fake face images. Aware of the dangers, researchers have developed various tools to spot these counterfeits. Yet none asked the fundamental question: What digital manipulations make a real photographic face image fake, while others do not? In this paper, we put face forgery in a semantic context and define that computational methods that alter semantic face attributes to exceed human discrimination thresholds are sources of face forgery. Guided by our new definition, we construct a large face forgery image dataset, where each image is associated with a set of labels organized in a hierarchical graph. Our dataset enables two new testing protocols to probe the generalization of face forgery detectors. Moreover, we propose a semantics-oriented face forgery detection method that captures label relations and prioritizes the primary task (\ie, real or fake face detection). We show that the proposed dataset successfully exposes the weaknesses of current detectors as the test set and consistently improves their generalizability as the training set. Additionally, we demonstrate the superiority of our semantics-oriented method over traditional binary and multi-class classification-based detectors.
Computer Vision and Pattern Recognition,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in face recognition forgery detection, how to define and identify which digital operations will make a real face image become fake, while other operations will not. Specifically, the authors re - thought the concept of face recognition forgery and proposed a new definition: **A computational method that changes the semantic attributes of a face beyond the human discrimination threshold is the source of face recognition forgery**. Based on this new definition, they constructed a new dataset (Face Forgery in the Semantic Context, FFSC) and proposed a new semantic - oriented face forgery detection method. ### Main contributions: 1. **New definition**: Proposed a face forgery definition based on semantic context, emphasizing the importance of face semantic attributes. 2. **New dataset**: Constructed a large - scale face forgery image dataset (FFSC) containing a semantic label hierarchy. 3. **New detection method**: Proposed a semantic - oriented face forgery detection method that can capture label relationships and prioritize the main task (i.e., real - fake face detection). ### Background and motivation: In recent years, the development of deep - learning technology has greatly simplified the process of generating realistic fake face images. Although researchers have developed a variety of tools to identify these forged images, the generalization ability of existing detection methods in the face of new face operations is still limited. The paper points out that if it is not possible to clearly distinguish which operations will make a real face image become fake, it is impossible to discuss the generalization ability of face forgery detectors. ### Method overview: 1. **Dataset construction**: - Collected high - resolution videos from two popular datasets (AVSpeech and Celeb - DF YouTube - real), and extracted 63,344 real face images from them. - Used 12 different facial operation methods to process these images, generating 75,176 forged images. - Each image is associated with a set of semantic labels, which are organized in a hierarchical acyclic graph. 2. **Semantic - oriented detection method**: - Calculate the joint probability distribution on the label hierarchy to encode label relationships. - Derive the marginal probability of each label, corresponding to the standard binary classification task. - Through bi - level optimization, prioritize the main task (i.e., real - fake face detection), encouraging the learning of transferable features across operations. ### Experimental results: - The proposed FFSC dataset challenges current face forgery detectors on the test set, but can more effectively induce detectors with greater generalization ability on the training set. - Compared with traditional binary - classification and multi - classification detection methods, the proposed semantic - oriented detection method shows superior performance. ### Conclusion: By redefining the concept of face forgery and constructing a new dataset, this research provides a new direction for improving the generalization ability of face forgery detectors. The proposed semantic - oriented detection method not only outperforms traditional methods in performance, but also provides new ideas for future research.