Abstract:The challenge of generalization in face forgery detection has become increasingly prominent as manipulation techniques continue to evolve. Although recent image blending-based methods have demonstrated remarkable potential, they often encounter a significant performance drop when applied to datasets exhibiting significant domain gaps. This limitation stems from the exclusive reliance of prior methods on blending unaltered faces with various augmentations to produce common artifacts, which ignores the inherent characteristics of the forged regions. To fully exploit the potential of image blending-based methods for generalizable Deepfake detection, we propose a novel image synthesis framework called Bi-Level Inconsistency Generator (Bi-LIG) to introduce bi-level inconsistency in the synthesized images. Specifically, Bi-LIG generates synthetic images by blending source and target images from both pristine and forged image sets, introducing a) Extrinsic-Inconsistency between real and pseudo-forged regions, and b) Inherent-Inconsistency between real and manipulated areas. In this way, Bi-LIG creates a diverse synthesized image set and establishes a generalizable training domain. Furthermore, we propose a novel face forgery detection network named Token Consistency Constrained Vision Transformer, in which two modules are developed based on patch consistency learning. Firstly, a Patch Token Contrast module is employed to learn the bi-level patch inconsistencies. Secondly, a Progressive Patch Token Assemble module is adopted to aggregate local patch relations and enhance the inconsistency representations. Experimental results demonstrate the effectiveness and superiority of our method on both in-dataset and cross-dataset evaluations. Notably, our approach outperforms state-of-the-art methods by 5.09% and 10.15% on cross-dataset evaluations in DFDCp and DFDC, respectively.

Bi-source Reconstruction based Classification Network for Face Forgery Video Detection

A Cascade Face Spoofing Detector Based on Face Anti-Spoofing R-CNN and Improved Retinex LBP

Unified Video and Image Representation for Boosted Video Face Forgery Detection

UniForensics: Face Forgery Detection via General Facial Representation

Forgery-Domain-Supervised Deepfake Detection with Non-Negative Constraint.

MDCF-Net: Multi-Scale Dual-Branch Network for Compressed Face Forgery Detection

Multi-feature fusion based face forgery detection with local and global characteristics

Lightweight detection method for deepfake face video

Pixel Bleach Network for Detecting Face Forgery under Compression

Exploring Bi-Level Inconsistency Via Blended Images for Generalizable Face Forgery Detection

Multi-level feature disentanglement network for cross-dataset face forgery detection

Face Forgery Detection with Long-Range Noise Features and Multilevel Frequency-Aware Clues

Deep Face Forgery Detection

Exploring varying color spaces through representative forgery learning to improve deepfake detection

Learning Natural Consistency Representation for Face Forgery Video Detection

Multi-attentional Deepfake Detection

A survey on face forgery detection of Deepfake

MesoNet: a Compact Facial Video Forgery Detection Network

BENet: A Cross-domain Robust Network for Detecting Face Forgeries via Bias Expansion and Latent-space Attention

Low-complexity Fake Face Detection Based on Forensic Similarity

Common Forgery Artifact Driven Deepfake Face Detection