Abstract:Recently, advanced development of facial manipulation techniques threatens web information security, thus, face forgery detection attracts a lot of attention. It is clear that both spatial and temporal information of facial videos contains the crucial manipulation traces, which are inevitably created during the generation process. However, most existing face forgery detectors only focus on the spatial artifacts or the temporal incoherence, and they are struggling to learn a significant and general kind of representations for manipulated facial videos. In this work, we propose to construct spatial-temporal graphs for fake videos to capture the spatial inconsistency and the temporal incoherence at the same time. To model the spatial-temporal relationship among the graph nodes, a novel forgery detector named Spatio-Temporal Graph Network (STGN) is proposed, which contains two kinds of graph-convolution-based units, the Spatial Relation Graph Unit (SRGU) and the Temporal Attention Graph Unit (TAGU). To exploit spatial information, the SRGU models the inconsistency between each pair of patches in the same frame, instead of focusing on the low-level local spatial artifacts which are vulnerable to samples created by unseen manipulation methods. And, the TAGU is proposed to model the long-distance temporal relation among the patches at the same spatial position in different frames with a graph attention mechanism based on the inter-node similarity. With the SRGU and the TAGU, our STGN can combine the discriminative power of spatial inconsistency and the generalization capacity of temporal incoherence for face forgery detection. Our STGN achieves state-of-the-art performances on several popular forgery detection datasets. Extensive experiments demonstrate both the superiority of our STGN on intra manipulation evaluation and the effectiveness for new sorts of face forgery videos on cross manipulation evaluation.

Hybrid Spatio-Temporal Network for Face Forgery Detection

Constructing Spatio-Temporal Graphs for Face Forgery Detection

Spatial-temporal Transformer Network for Protecting Person-of-interest from Deepfaking

Combined spatial and frequency dual stream network for face forgery detection

Face Forgery Detection with Long-Range Noise Features and Multilevel Frequency-Aware Clues

Unified Video and Image Representation for Boosted Video Face Forgery Detection

Hierarchical Frequency-Assisted Interactive Networks for Face Manipulation Detection

Multi-level feature disentanglement network for cross-dataset face forgery detection

Latent Spatiotemporal Adaptation for Generalized Face Forgery Video Detection

Exploring Temporal Coherence for More General Video Face Forgery Detection

Identifying Rhythmic Patterns for Face Forgery Detection and Categorization

A Temporal Consistency Learning Framework for Face Forgery Detection

FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

Face Forgery Detection Based on the Improved Siamese Network

Exposing video surveillance object forgery by combining TSF features and attention-based deep neural networks

MDCF-Net: Multi-Scale Dual-Branch Network for Compressed Face Forgery Detection

DeepFake detection method based on multi-scale interactive dual-stream network

Multi-spectral Class Center Network for Face Manipulation Detection and Localization

UniForensics: Face Forgery Detection via General Facial Representation

Research on video face forgery detection model based on multiple feature fusion network

Exploring Disentangled Content Information for Face Forgery Detection