Abstract:With the rapid development of face forgery techniques, the existing frame-based deepfake video detection methods have fell into a dilemma that frame-based methods may fail when encountering extremely realistic images. To overcome the above problem, many approaches attempted to model the spatio-temporal inconsistency of videos to distinguish real and fake videos. However, current works model spatio-temporal inconsistency by combining intra-frame and inter-frame information, but ignore the disturbance caused by facial motions that would limit further improvement in detection performance. To address this issue, we investigate into long and short range inter-frame motions and propose a novel dynamic difference learning method to distinguish between the inter-frame differences caused by face manipulation and the inter-frame differences caused by facial motions in order to model precise spatio-temporal inconsistency for deepfake video detection. Moreover, we elaborately design a dynamic fine-grained difference capture module (DFDC-module) and a multi-scale spatio-temporal aggregation module (MSA-module) to collaboratively model spatio-temporal inconsistency. Specifically, the DFDC-module applies self-attention mechanism and fine-grained denoising operation to eliminate the differences caused by facial motions and generates long range difference attention maps. The MSA-module is devised to aggregate multi-direction and multi-scale temporal information to model spatio-temporal inconsistency. The existing 2D CNNs can be extended into dynamic spatio-temporal inconsistency capture networks by integrating the proposed two modules. Extensive experimental results demonstrate that our proposed algorithm steadily outperforms state-of-the-art methods by a clear margin in different benchmark datasets.

Detecting Deepfake Videos Based on Spatiotemporal Attention and Convolutional LSTM

Detection of deepfake technology in images and videos

Dynamic Difference Learning with Spatio-temporal Correlation for Deepfake Video Detection

Deepfake Detection Based on Temporal Analysis of Facial Dynamics Using LSTM and ResNeXt Architectures

Deepfake Video Detection Using 3D-Attentional Inception Convolutional Neural Network

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

Deepfake detection: Enhancing performance with spatiotemporal texture and deep learning feature fusion

Improved Xception with Dual Attention Mechanism and Feature Fusion for Face Forgery Detection

Deep Convolutional Pooling Transformer for Deepfake Detection

Deep fake video/image detection using deep learning

Enhance the Motion Cues for Face Anti-Spoofing using CNN-LSTM Architecture

Multi-attentional Deepfake Detection

Deep Fake Face Detection Using Long Short-Term Memory with Deep Learning Approach

A Hybrid CNN-LSTM Approach for Precision Deepfake Image Detection Based on Transfer Learning

Spatio-temporal Features for Generalized Detection of Deepfake Videos

Spatiotemporal Inconsistency Learning for DeepFake Video Detection

A Convolutional LSTM based Residual Network for Deepfake Video Detection

Adt: anti-deepfake transformer

Delving into the Local: Dynamic Inconsistency Learning for DeepFake Video Detection

DeepFake detection algorithm based on improved vision transformer

A Temporal Consistency Learning Framework for Face Forgery Detection