Abstract:Previous deepfake detection methods mostly depend on low-level textural features vulnerable to perturbations and fall short of detecting unseen forgery methods. In contrast, high-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. Motivated by this, we propose a detection method that utilizes high-level semantic features of faces to identify inconsistencies in temporal domain. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video classification network, initialized with a meta-functional face encoder for enriched facial representation. In this way, we can take advantage of both the powerful spatio-temporal model and the high-level semantic information of faces. Furthermore, to leverage easily accessible real face data and guide the model in focusing on spatio-temporal features, we design a Dynamic Video Self-Blending (DVSB) method to efficiently generate training samples with diverse spatio-temporal forgery traces using real facial videos. Based on this, we advance our framework with a two-stage training approach: The first stage employs a novel self-supervised contrastive learning, where we encourage the network to focus on forgery traces by impelling videos generated by the same forgery process to have similar representations. On the basis of the representation learned in the first stage, the second stage involves fine-tuning on face forgery detection dataset to build a deepfake detector. Extensive experiments validates that UniForensics outperforms existing face forgery methods in generalization ability and robustness. In particular, our method achieves 95.3\% and 77.2\% cross dataset AUC on the challenging Celeb-DFv2 and DFDC respectively.

Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes

Interpretable Deepfake Detection via Dynamic Prototypes

PUDD: Towards Robust Multi-modal Prototype-based Deepfake Detection

ProtoExplorer: Interpretable Forensic Analysis of Deepfake Videos using Prototype Exploration and Refinement

A Temporal Consistency Learning Framework for Face Forgery Detection

Deepfake Detection Based on Temporal Analysis of Facial Dynamics Using LSTM and ResNeXt Architectures

Spatial-temporal Transformer Network for Protecting Person-of-interest from Deepfaking

Double-Stream Segmentation Network with Temporal Self-attention for Deepfake Video Detection

Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework.

Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

Deepfake Video Detection Via Predictive Representation Learning

A defensive framework for deepfake detection under adversarial settings using temporal and spatial features

Exploring varying color spaces through representative forgery learning to improve deepfake detection

Temporal Consistency Based Deep Face Forgery Detection Network.

Exploring Static–Dynamic ID Matching and Temporal Static ID Inconsistency for Generalizable Deepfake Detection

Exploiting Complementary Dynamic Incoherence for DeepFake Video Detection

UniForensics: Face Forgery Detection via General Facial Representation

Multi-attentional Deepfake Detection

Combating deepfakes: a comprehensive multilayer deepfake video detection framework

Exposing Deepfake Videos with Spatial, Frequency and Multi-scale Temporal Artifacts