Abstract:This paper presents the dual‐stream frequency‐spatial fusion network for deepfake detection, which integrates spatial and frequency domain features to enhance detection accuracy and robustness. The network includes a spatial forgery feature extraction module, a frequency forgery feature extraction module, and a spatial‐frequency feature fusion module, using attention mechanisms to extract and fuse features. Extensive experiments demonstrate that dual‐stream frequency‐spatial fusion network outperforms existing methods, offering superior generalization and robustness across various deepfake datasets. In recent years, face forgery detection has gained significant attention, resulting in considerable advancements. However, most existing methods rely on CNNs to extract artefacts from the spatial domain, overlooking the pervasive frequency‐domain artefacts present in deepfake content, which poses challenges in achieving robust and generalized detection. To address these issues, we propose the dual‐stream frequency—spatial fusion network is proposed for deepfake detection. The dual‐stream frequency‐spatial fusion network consists of three components: the spatial forgery feature extraction module, the frequency forgery feature extraction module, and the spatial–frequency feature fusion module. The spatial forgery feature extraction module employs spatial‐channel attention to extract spatial domain features, targeting artefacts in the spatial domain. The frequency forgery feature extraction module leverages the focused linear attention to detect frequency domain anomalies in internal regions, enabling the identification of generated content. The spatial–frequency feature fusion module then fuses forgery features extracted from both the spatial and frequency domains, facilitating accurate detection of splicing artefacts and internally generated forgeries. This approach enhances the model's ability to more accurately capture forgery characteristics. Extensive experiments on several widely‐used benchmarks demonstrate that our carefully designed network exhibits superior generalization and robustness, significantly improving deepfake detection performance.

Deepfake Detection Based on the Adaptive Fusion of Spatial‐Frequency Features

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

Learning spatial‐frequency interaction for generalizable deepfake detection

Spatial-frequency feature fusion based deepfake detection through knowledge distillation

Multi-feature fusion based face forgery detection with local and global characteristics

FFR_FD: Effective and Fast Detection of DeepFakes Based on Feature Point Defects

Noise-aware progressive multi-scale deepfake detection

Face forgery detection by progressively enhancing spatial and frequency-aware features

Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning

Combating deepfakes: a comprehensive multilayer deepfake video detection framework

Multi-domain awareness for compressed deepfake videos detection over social networks guided by common mechanisms between artifacts

FFR_FD: Effective and fast detection of DeepFakes via feature point defects

Multiple Contexts and Frequencies Aggregation Network forDeepfake Detection

Common Forgery Artifact Driven Deepfake Face Detection

A shared updatable method of content regulation for deepfake videos based on blockchain

A defensive framework for deepfake detection under adversarial settings using temporal and spatial features

DeepFake detection method based on multi-scale interactive dual-stream network

Exploring varying color spaces through representative forgery learning to improve deepfake detection

Multi-attentional Deepfake Detection

WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake Detection

Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection