Towards the Detection of AI-Synthesized Human Face Images

Yuhang Lu,Touradj Ebrahimi
2024-02-14
Abstract:Over the past years, image generation and manipulation have achieved remarkable progress due to the rapid development of generative AI based on deep learning. Recent studies have devoted significant efforts to address the problem of face image manipulation caused by deepfake techniques. However, the problem of detecting purely synthesized face images has been explored to a lesser extent. In particular, the recent popular Diffusion Models (DMs) have shown remarkable success in image synthesis. Existing detectors struggle to generalize between synthesized images created by different generative models. In this work, a comprehensive benchmark including human face images produced by Generative Adversarial Networks (GANs) and a variety of DMs has been established to evaluate both the generalization ability and robustness of state-of-the-art detectors. Then, the forgery traces introduced by different generative models have been analyzed in the frequency domain to draw various insights. The paper further demonstrates that a detector trained with frequency representation can generalize well to other unseen generative models.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problem this paper attempts to address is the detection of fully AI-synthesized face images. Specifically, existing research mainly focuses on detecting face image manipulation through deepfake techniques (such as Deepfake), but there is less research on detecting fully synthesized face images. In particular, diffusion models (DMs) have achieved significant success in image synthesis in recent years, but existing detectors have poor generalization ability in identifying synthetic images created by different generative models. Therefore, this paper aims to establish a comprehensive benchmark to evaluate the generalization ability and robustness of existing detectors and explore how to improve detector performance through frequency domain analysis. ### Main Contributions: 1. **Establishing a New Benchmark**: Systematically generated a large number of synthetic face images created by seven popular generative models (including three Generative Adversarial Networks GANs and four Diffusion Models DMs) to evaluate the generalization ability and robustness of detectors. 2. **Frequency Domain Analysis**: By analyzing the frequency spectrum of synthetic face images, it was found that these images have significant differences from real images in the frequency domain. Experimental results show that training detectors using frequency representations can significantly improve their performance and generalization ability. 3. **Evaluating Existing Detectors**: Evaluated several existing learning-based detectors, including their generalization ability on images created by different generative models and their robustness to common image perturbations. ### Research Background: - **Development of Generative Models**: Generative Adversarial Networks (GANs) and Diffusion Models (DMs) have made significant progress in image synthesis, capable of generating highly realistic synthetic images. - **Detection Needs**: With the widespread application of synthetic images, effectively detecting these synthetic images has become an important research topic. Existing detection methods mainly focus on specific types of generative models with limited generalization ability. - **Frequency Domain Analysis**: Research shows that synthetic images have specific artifacts in the frequency domain, which can be used as a basis for detection. ### Experimental Design: - **Dataset**: Collected real images and synthetic face images created by seven generative models, with each generative technique generating 40,000 images, divided into training, validation, and test sets. - **Detectors**: Selected several existing learning-based detectors, including models based on ResNet-50, XceptionNet, and EfficientNetB4, as well as some pre-trained detectors. - **Evaluation Metrics**: Used Average Precision (AP) and Area Under the Receiver Operating Characteristic Curve (AUC) to evaluate the performance of the detectors. ### Experimental Results: - **Performance of Existing Detectors**: Most existing detectors perform poorly when generalizing to synthetic images created by different generative models, especially when dealing with images generated by diffusion models. - **Advantages of Frequency Domain Representation**: Detectors trained using frequency domain representations show excellent generalization ability and robustness, particularly when handling images created by different generative models. - **Robustness Evaluation**: Detectors show different robustness when facing common image perturbations (such as JPEG compression, Gaussian blur, Gaussian noise, and scaling), with the Mandelli2022 detector performing best in handling JPEG compression and Gaussian blur. ### Conclusion: This paper establishes a comprehensive benchmark to evaluate the performance of existing detectors in the task of detecting fully AI-synthesized face images. The research results show that training detectors using frequency domain representations can significantly improve their generalization ability and robustness, providing new directions for future research.