Exploring Bi-Level Inconsistency Via Blended Images for Generalizable Face Forgery Detection
Peiqi Jiang,Hongtao Xie,Lingyun Yu,Guoqing Jin,Yongdong Zhang
DOI: https://doi.org/10.1109/tifs.2024.3417266
IF: 7.231
2024-01-01
IEEE Transactions on Information Forensics and Security
Abstract:The challenge of generalization in face forgery detection has become increasingly prominent as manipulation techniques continue to evolve. Although recent image blending-based methods have demonstrated remarkable potential, they often encounter a significant performance drop when applied to datasets exhibiting significant domain gaps. This limitation stems from the exclusive reliance of prior methods on blending unaltered faces with various augmentations to produce common artifacts, which ignores the inherent characteristics of the forged regions. To fully exploit the potential of image blending-based methods for generalizable Deepfake detection, we propose a novel image synthesis framework called Bi-Level Inconsistency Generator (Bi-LIG) to introduce bi-level inconsistency in the synthesized images. Specifically, Bi-LIG generates synthetic images by blending source and target images from both pristine and forged image sets, introducing a) Extrinsic-Inconsistency between real and pseudo-forged regions, and b) Inherent-Inconsistency between real and manipulated areas. In this way, Bi-LIG creates a diverse synthesized image set and establishes a generalizable training domain. Furthermore, we propose a novel face forgery detection network named Token Consistency Constrained Vision Transformer, in which two modules are developed based on patch consistency learning. Firstly, a Patch Token Contrast module is employed to learn the bi-level patch inconsistencies. Secondly, a Progressive Patch Token Assemble module is adopted to aggregate local patch relations and enhance the inconsistency representations. Experimental results demonstrate the effectiveness and superiority of our method on both in-dataset and cross-dataset evaluations. Notably, our approach outperforms state-of-the-art methods by 5.09% and 10.15% on cross-dataset evaluations in DFDCp and DFDC, respectively.