DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection

Rui Shao,Tianxing Wu,Liqiang Nie,Ziwei Liu

2023-06-02

Abstract:Existing deepfake detection methods fail to generalize well to unseen or degraded samples, which can be attributed to the over-fitting of low-level forgery patterns. Here we argue that high-level semantics are also indispensable recipes for generalizable forgery detection. Recently, large pre-trained Vision Transformers (ViTs) have shown promising generalization capability. In this paper, we propose the first parameter-efficient tuning approach for deepfake detection, namely DeepFake-Adapter, to effectively and efficiently adapt the generalizable high-level semantics from large pre-trained ViTs to aid deepfake detection. Given large pre-trained models but limited deepfake data, DeepFake-Adapter introduces lightweight yet dedicated dual-level adapter modules to a ViT while keeping the model backbone frozen. Specifically, to guide the adaptation process to be aware of both global and local forgery cues of deepfake data, 1) we not only insert Globally-aware Bottleneck Adapters in parallel to MLP layers of ViT, 2) but also actively cross-attend Locally-aware Spatial Adapters with features from ViT. Unlike existing deepfake detection methods merely focusing on low-level forgery patterns, the forgery detection process of our model can be regularized by generalizable high-level semantics from a pre-trained ViT and adapted by global and local low-level forgeries of deepfake data. Extensive experiments on several standard deepfake detection benchmarks validate the effectiveness of our approach. Notably, DeepFake-Adapter demonstrates a convincing advantage under cross-dataset and cross-manipulation settings. The source code is released at <a class="link-external link-https" href="https://github.com/rshaojimmy/DeepFake-Adapter" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem this paper attempts to address is the insufficient generalization capability of existing deepfake detection methods when faced with unseen or low-quality samples. This is mainly because current methods overfit to low-level forgery features while neglecting the importance of high-level semantics. The authors believe that leveraging high-level semantics from large pre-trained Vision Transformers (ViT) is crucial for improving the generalization capability of forgery detection. Specifically, the paper proposes a parameter-efficient tuning method called DeepFake-Adapter, which effectively applies high-level semantics from large pre-trained ViTs to deepfake detection by introducing lightweight dual-level adapter modules. These adapter modules include the Global-aware Bottleneck Adapter (GBA) and the Local-aware Spatial Adapter (LSA), which are used to capture global and local low-level forgery features, respectively, and interact organically with high-level semantics in the ViT to generate more generalized forgery representations. The main contributions include: 1. **Utilization of High-Level Semantics**: For the first time, adapter technology is introduced into the field of deepfake detection, utilizing high-level semantics from large pre-trained ViTs. 2. **Dual-Level Adapters**: A novel dual-level adapter, DeepFake-Adapter, is proposed, including GBA and LSA, which can effectively adapt to pre-trained ViTs and generate more generalized forgery representations. 3. **Experimental Validation**: Extensive quantitative and qualitative experiments demonstrate the superiority of this method in deepfake detection, especially in cross-dataset and cross-operation settings. Through these innovations, the paper aims to improve the generalization capability and robustness of deepfake detection, making it more effective when faced with different types of forgeries and various post-processing operations.

DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture

Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer

Detect Any Deepfakes: Segment Anything Meets Face Forgery Detection and Localization

Adt: anti-deepfake transformer

FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection

DeepFake detection algorithm based on improved vision transformer

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection

Audio-Visual Contrastive Pre-train for Face Forgery Detection

Decoupling Forgery Semantics for Generalizable Deepfake Detection

Voice-Face Homogeneity Tells Deepfake

Common Forgery Artifact Driven Deepfake Face Detection

Deepfake Detection Scheme Based on Vision Transformer and Distillation

Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection

Latent Spatiotemporal Adaptation for Generalized Face Forgery Video Detection

UniForensics: Face Forgery Detection via General Facial Representation

Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model

S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens