Abstract:Previous deepfake detection methods mostly depend on low-level textural features vulnerable to perturbations and fall short of detecting unseen forgery methods. In contrast, high-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. Motivated by this, we propose a detection method that utilizes high-level semantic features of faces to identify inconsistencies in temporal domain. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video classification network, initialized with a meta-functional face encoder for enriched facial representation. In this way, we can take advantage of both the powerful spatio-temporal model and the high-level semantic information of faces. Furthermore, to leverage easily accessible real face data and guide the model in focusing on spatio-temporal features, we design a Dynamic Video Self-Blending (DVSB) method to efficiently generate training samples with diverse spatio-temporal forgery traces using real facial videos. Based on this, we advance our framework with a two-stage training approach: The first stage employs a novel self-supervised contrastive learning, where we encourage the network to focus on forgery traces by impelling videos generated by the same forgery process to have similar representations. On the basis of the representation learned in the first stage, the second stage involves fine-tuning on face forgery detection dataset to build a deepfake detector. Extensive experiments validates that UniForensics outperforms existing face forgery methods in generalization ability and robustness. In particular, our method achieves 95.3\% and 77.2\% cross dataset AUC on the challenging Celeb-DFv2 and DFDC respectively.

Deepfake Video Detection Via Predictive Representation Learning

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

Spatio-temporal Features for Generalized Detection of Deepfake Videos

Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model

Dynamic Difference Learning with Spatio-temporal Correlation for Deepfake Video Detection

Video Detection Method Based on Temporal and Spatial Foundations for Accurate Verification of Authenticity

An efficient deepfake video detection using robust deep learning

One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework

Undercover Deepfakes: Detecting Fake Segments in Videos

DeepFake detection algorithm based on improved vision transformer

Deepfake Video Detection with Spatiotemporal Dropout Transformer

UniForensics: Face Forgery Detection via General Facial Representation

DeepFake Detection with Inconsistent Head Poses: Reproducibility and Analysis

Deepfake Detection Based on Temporal Analysis of Facial Dynamics Using LSTM and ResNeXt Architectures

FFR_FD: Effective and fast detection of DeepFakes via feature point defects

Learning a Deep Dual-Level Network for Robust DeepFake Detection

Temporal Feature Prediction in Audio–Visual Deepfake Detection

Unearthing Common Inconsistency for Generalisable Deepfake Detection

Multi-feature fusion based face forgery detection with local and global characteristics

Combating deepfakes: a comprehensive multilayer deepfake video detection framework

Real-Time Advanced Computational Intelligence for Deep Fake Video Detection