What problem does this paper attempt to address?

The problem that this paper attempts to solve is to automatically and efficiently detect face forgery in videos, especially for two techniques that generate surreal forged videos: Deepfake and Face2Face. Traditional image forensics techniques are usually not applicable to videos because video compression will severely degrade the data, making it difficult for these techniques to be effectively applied. Therefore, this paper adopts a deep - learning method and proposes two networks with fewer layers, focusing on the mesoscopic characteristics of images to overcome the challenges brought by video compression, and tests on existing datasets and the dataset constructed by the authors, showing a Deepfake detection rate of over 98% and a Face2Face detection rate of 95%. ### Background of Deepfake Detection With the popularization of smart phones and the development of social networks, digital images and videos have become very common digital objects. It is reported that nearly 2 billion pictures are uploaded to the Internet every day. This huge amount of use is accompanied by the rise of image content tampering techniques, such as using editing software like Photoshop. The field of digital image forensics research is dedicated to detecting image forgeries to regulate the spread of false content. Although there are already many methods for detecting image forgeries, video forgery detection is still a difficult problem, mainly due to the strong degradation of frames after video compression. ### Deepfake and Face2Face Technologies - **Deepfake**: Face - swapping is achieved by training two auto - encoders. One auto - encoder is used to reconstruct the face image of target person A, and the other auto - encoder is used to reconstruct the face image of source person B. The two auto - encoders share the weights of the encoding part, but the decoding parts remain independent. This method can generate highly realistic forged videos, but it also has some flaws, such as failure when the face is occluded and blurred details. - **Face2Face**: Facial re - enactment is achieved by real - time tracking of facial expressions in the source video and the target video, and then synthesizing the expressions of the source video onto the face of the target video. This method does not require deep learning but uses traditional computer vision techniques. ### Proposed Method This paper proposes a deep neural network method based on mesoscopic analysis, aiming to detect forged videos generated by Deepfake and Face2Face. Specifically, two network architectures are proposed: - **Meso - 4**: It contains four convolutional and pooling layers, followed by a fully - connected network with a hidden layer. ReLU activation function, batch normalization and Dropout are used to improve generalization ability and robustness. - **MesoInception - 4**: Based on Meso - 4, the first two convolutional layers are replaced with variants of Inception modules, using 3×3 dilated convolutions to avoid the introduction of high - semantic information, and adding 1×1 convolutions for dimension reduction and skip connections. ### Experimental Results - **Deepfake Dataset**: The accuracies of the two networks in independent frame classification are 89.1% and 91.7% respectively. Through image aggregation, the detection rates are further increased to 96.9% and 98.4%. - **Face2Face Dataset**: Under different compression levels, the classification accuracies of Meso - 4 and MesoInception - 4 are 94.6%, 92.4%, 83.2% and 96.8%, 93.4%, 81.3% respectively. Through image aggregation, the detection rate is increased to 95.3%. ### Conclusion The network architectures proposed in this paper have a high detection rate for forged videos generated by Deepfake and Face2Face under actual conditions. By visualizing the layers and filters of the network, the study found that the eye and mouth regions play a key role in Deepfake detection, while the background region is often more blurred in forged images. Future research will further improve the understanding of deep networks to create more effective and efficient detection methods.

MesoNet: a Compact Facial Video Forgery Detection Network

Deep Face Forgery Detection

Identify Videos with Facial Manipulations Based on Convolution Neural Network and Dynamic Texture

Multi-Layer Fusion Neural Network for Deepfake Detection.

UniForensics: Face Forgery Detection via General Facial Representation

Temporal Consistency Based Deep Face Forgery Detection Network.

Real-Time Deepfake Video Detection Using Eye Movement Analysis with a Hybrid Deep Learning Approach

MDCF-Net: Multi-Scale Dual-Branch Network for Compressed Face Forgery Detection

Lightweight detection method for deepfake face video

DeepFake detection method based on multi-scale interactive dual-stream network

FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

Real-Time Advanced Computational Intelligence for Deep Fake Video Detection

Video Forgery Detection Using Spatio-Temporal Dual Transformer.

Videoforensicshq: Detecting High-Quality Manipulated Face Videos

Deep Fake Detection: Survey of Facial Manipulation Detection Solutions

Multi-feature fusion based face forgery detection with local and global characteristics

Low-complexity Fake Face Detection Based on Forensic Similarity

Detection of Deepfake Videos Using Long-Distance Attention

A Note on Deepfake Detection with Low-Resources

Common Forgery Artifact Driven Deepfake Face Detection

Detection of Deepfake Video Using Residual Neural Network and Long Short-Term Memory