Abstract:The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into a significant threat, complicating the task of deepfake detection. Despite the notable success of current deepfake detection methods, which predominantly employ data-driven CNN classification models, these methods exhibit limited generalization capabilities and insufficient robustness against novel data unseen during training. To tackle these challenges, this paper introduces a novel detection framework, ReLAF-Net. This framework employs a restricted self-attention mechanism that applies self-attention to deep CNN features flexibly, facilitating the learning of local relationships and inter-regional dependencies at both fine-grained and global levels. This attention mechanism has a modular design that can be seamlessly integrated into CNN networks to improve overall detection performance. Additionally, we propose an adaptive local frequency feature extraction algorithm that decomposes RGB images into fine-grained frequency domains in a data-driven manner, effectively isolating fake indicators in the frequency space. Moreover, an attention-based channel fusion strategy is developed to amalgamate RGB and frequency information, achieving a comprehensive facial representation. Tested on the high-quality version of the FaceForensics++ dataset, our method attained a detection accuracy of 97.92%, outperforming other approaches. Cross-dataset validation on Celeb-DF, DFDC, and DFD confirms the robust generalizability, offering a new solution for detecting high-quality deepfake videos.

DST-FRD: A Distillation Method of Swin Transformer for Facial Reenactment Detection

AGIL-SwinT: Attention-guided Inconsistency Learning for Face Forgery Detection

DeepFake detection algorithm based on improved vision transformer

Hybrid Transformer Network for Deepfake Detection

Adaptive Swin Transformers for Few-Shot Cross-Domain Silent Face Liveness Detection

SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Deep Convolutional Pooling Transformer for Deepfake Detection

Deepfake Detection Scheme Based on Vision Transformer and Distillation

Adt: anti-deepfake transformer

Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

Cross Architecture Distillation for Face Recognition

FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

Efficient Low-Resolution Face Recognition via Bridge Distillation

Spatial-temporal Transformer Network for Protecting Person-of-interest from Deepfaking

DTCNet: Transformer-CNN Distillation for Super-Resolution of Remote Sensing Image

Lightweight detection method for deepfake face video

MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection

S-Swin Transformer: simplified Swin Transformer model for offline handwritten Chinese character recognition