A Detail-Aware Transformer to Generalisable Face Forgery Detection

Jiaming Li,Lingyun Yu,Runxin Liu,Hongtao Xie
DOI: https://doi.org/10.1109/tcsvt.2024.3509693
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Generalisable face forgery detectors strive to detect forgeries generated by unseen manipulations. Recently advanced detection methods have managed to capture subtle blending traces, but their neglect of the diversity of blending traces in different regions leads to limited generalization. Towards this, transformer with global receptive fields and dynamic weight mechanism is a promising solution, but vanilla transformer is weak at capturing subtle blending traces. In this paper, we propose a novel Detail-Aware Transformer (DAT) able to focus on both diverse and subtle blending traces caused by inconsistencies in the low-level image details. The intrinsic multi-head self-attention mechanism of the transformer allows our DAT to adaptively capture diverse blending traces in different regions. Furthermore, we improve the transformer’s capability of capturing subtle blending traces by two inference overhead-free measures, i.e. , self-supervised pre-training based on patch augmentation and region-level contrastive learning. Specifically, the self-supervised pre-training encourages the model to focus on the inconsistencies in low-level image details through a patch number prediction task. The region-level contrastive learning employs a contrastive loss on representations of regions with different low-level details to further improve the transformer’s ability to handle subtle blending traces. Extensive experiments show that our method substantially improves the generalization performance and outperforms the state-of-the-art methods on CDF, DFDC, DFDCP, FFIW, and WildDeepfake datasets.
What problem does this paper attempt to address?