Abstract:The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into a significant threat, complicating the task of deepfake detection. Despite the notable success of current deepfake detection methods, which predominantly employ data-driven CNN classification models, these methods exhibit limited generalization capabilities and insufficient robustness against novel data unseen during training. To tackle these challenges, this paper introduces a novel detection framework, ReLAF-Net. This framework employs a restricted self-attention mechanism that applies self-attention to deep CNN features flexibly, facilitating the learning of local relationships and inter-regional dependencies at both fine-grained and global levels. This attention mechanism has a modular design that can be seamlessly integrated into CNN networks to improve overall detection performance. Additionally, we propose an adaptive local frequency feature extraction algorithm that decomposes RGB images into fine-grained frequency domains in a data-driven manner, effectively isolating fake indicators in the frequency space. Moreover, an attention-based channel fusion strategy is developed to amalgamate RGB and frequency information, achieving a comprehensive facial representation. Tested on the high-quality version of the FaceForensics++ dataset, our method attained a detection accuracy of 97.92%, outperforming other approaches. Cross-dataset validation on Celeb-DF, DFDC, and DFD confirms the robust generalizability, offering a new solution for detecting high-quality deepfake videos.

Fusing Multi-scale Attention and Transformer for Detection and Localization of Image Splicing Forgery

End-to-end Image Splicing Localization Based on Multi-Scale Features and Residual Refinement Module

Feature Aggregation and Region-Aware Learning for Detection of Splicing Forgery

Cross-attention based two-branch networks for document image forgery localization in the Metaverse

Image Manipulation Localization Using Multi-Scale Feature Fusion and Adaptive Edge Supervision

ET: Edge-Enhanced Transformer for Image Splicing Detection

Multitask Image Splicing Tampering Detection Based on Attention Mechanism

Double-branch forgery image detection based on multi-scale feature fusion

Exploring multi-scale forgery clues for stereo super-resolution image forgery localization

Multi-scale Target-Aware Framework for Constrained Image Splicing Detection and Localization

Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion

CMCF-Net: an End-to-End Context Multiscale Cross-Fusion Network for Robust Copy-Move Forgery Detection

Image‐splicing forgery detection based on local binary patterns of DCT coefficients

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

D-Unet: A Dual-encoder U-Net for Image Splicing Forgery Detection and Localization

Combined spatial and frequency dual stream network for face forgery detection

DMFF-Net: Double-stream multilevel feature fusion network for image forgery localization

Image Manipulation Localization Using Spatial–Channel Fusion Excitation and Fine-Grained Feature Enhancement

Coarse-to-fine spatial-channel-boundary attention network for image copy-move forgery detection

Exposing video surveillance object forgery by combining TSF features and attention-based deep neural networks

Effect of intracellular pH on force and heat production in isometric contraction of frog muscle fibres.