Abstract:The proliferation of fake images generated by deepfake techniques has significantly threatened the trustworthiness of digital information, leading to a pressing need for face forgery detection. However, due to the similarity between human face images and the subtlety of artefact information, most deep face forgery detection methods face certain challenges, such as incomplete extraction of artefact information, limited performance in detecting low-quality forgeries, and insufficient generalization across different datasets. To address these issues, this paper proposes a novel noise-aware multi-scale deepfake detection model. Firstly, a progressive spatial attention module is introduced, which learns two types of spatial feature weights: boosting weight and suppression weight. The boosting weight highlights salient regions, while the suppression weight enables the model to capture more subtle artifact information. Through multiple boosting-suppression stages, the proposed model progressively focuses on different facial regions and extracts multi-scale RGB features. Additionally, a noise-aware two-stream network is introduced, which leverages frequency-domain features and fuses image noise with multi-scale RGB features. This integration enhances the model's ability to handle image post-processing. Furthermore, the model learns global features from multi-modal features through multiple convolutional layers, which are combined with local similarity features for deepfake detection, thereby improving the model's robustness. Experimental results on several benchmark databases demonstrate the superiority of our proposed method over state-of-the-art techniques. Our contributions lie in the progressive spatial attention module, which effectively addresses overfitting in CNNs, and the integration of noise-aware features and multi-scale RGB features. These innovations lead to enhanced accuracy and generalization performance in face forgery detection.

Dor: Detecting Deepfake Videos Using the Dissonance Between Intra- and Inter-Frame Maps

FFR_FD: Effective and Fast Detection of DeepFakes Based on Feature Point Defects

FInfer: Frame Inference-Based Deepfake Detection for High-Visual-Quality Videos

Dual-Modality Co-Learning for Unveiling Deepfake in Spatio-Temporal Space.

Coherent Adversarial Deepfake Video Generation

Noise-aware progressive multi-scale deepfake detection

Exploiting Complementary Dynamic Incoherence for DeepFake Video Detection

Video Detection Method Based on Temporal and Spatial Foundations for Accurate Verification of Authenticity

Exploring varying color spaces through representative forgery learning to improve deepfake detection

DFCP: Few-Shot DeepFake Detection via Contrastive Pretraining

Multi-feature fusion based face forgery detection with local and global characteristics

Interactive Two-Stream Network Across Modalities for Deepfake Detection

Combating deepfakes: a comprehensive multilayer deepfake video detection framework

FFR_FD: Effective and fast detection of DeepFakes via feature point defects

Dynamic Difference Learning with Spatio-temporal Correlation for Deepfake Video Detection

A defensive framework for deepfake detection under adversarial settings using temporal and spatial features

Optifake: optical flow extraction for deepfake detection using ensemble learning technique

Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection

Anti-Forensics for Face Swapping Videos via Adversarial Training

D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy

A survey on face forgery detection of Deepfake