Abstract:The challenge of generalization in face forgery detection has become increasingly prominent as manipulation techniques continue to evolve. Although recent image blending-based methods have demonstrated remarkable potential, they often encounter a significant performance drop when applied to datasets exhibiting significant domain gaps. This limitation stems from the exclusive reliance of prior methods on blending unaltered faces with various augmentations to produce common artifacts, which ignores the inherent characteristics of the forged regions. To fully exploit the potential of image blending-based methods for generalizable Deepfake detection, we propose a novel image synthesis framework called Bi-Level Inconsistency Generator (Bi-LIG) to introduce bi-level inconsistency in the synthesized images. Specifically, Bi-LIG generates synthetic images by blending source and target images from both pristine and forged image sets, introducing a) Extrinsic-Inconsistency between real and pseudo-forged regions, and b) Inherent-Inconsistency between real and manipulated areas. In this way, Bi-LIG creates a diverse synthesized image set and establishes a generalizable training domain. Furthermore, we propose a novel face forgery detection network named Token Consistency Constrained Vision Transformer, in which two modules are developed based on patch consistency learning. Firstly, a Patch Token Contrast module is employed to learn the bi-level patch inconsistencies. Secondly, a Progressive Patch Token Assemble module is adopted to aggregate local patch relations and enhance the inconsistency representations. Experimental results demonstrate the effectiveness and superiority of our method on both in-dataset and cross-dataset evaluations. Notably, our approach outperforms state-of-the-art methods by 5.09% and 10.15% on cross-dataset evaluations in DFDCp and DFDC, respectively.

Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos

Hierarchical Supervisions with Two-Stream Network for Deepfake Detection.

Unearthing Common Inconsistency for Generalisable Deepfake Detection

Dynamic Inconsistency-aware DeepFake Video Detection

Mining Generalized Features for Detecting AI-Manipulated Fake Faces

Dynamic Difference Learning with Spatio-temporal Correlation for Deepfake Video Detection

Delving into the Local: Dynamic Inconsistency Learning for DeepFake Video Detection

Multi-feature fusion based face forgery detection with local and global characteristics

Exploring Static–Dynamic ID Matching and Temporal Static ID Inconsistency for Generalizable Deepfake Detection

UniForensics: Face Forgery Detection via General Facial Representation

GM-DF: Generalized Multi-Scenario Deepfake Detection

Exploring Bi-Level Inconsistency Via Blended Images for Generalizable Face Forgery Detection

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

A Temporal Consistency Learning Framework for Face Forgery Detection

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

UCF: Uncovering Common Features for Generalizable Deepfake Detection

Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

Latent Spatiotemporal Adaptation for Generalized Face Forgery Video Detection

Decoupling Forgery Semantics for Generalizable Deepfake Detection

Learning a Deep Dual-Level Network for Robust DeepFake Detection

Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption