DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection

Rui Shao,Tianxing Wu,Liqiang Nie,Ziwei Liu
2023-06-02
Abstract:Existing deepfake detection methods fail to generalize well to unseen or degraded samples, which can be attributed to the over-fitting of low-level forgery patterns. Here we argue that high-level semantics are also indispensable recipes for generalizable forgery detection. Recently, large pre-trained Vision Transformers (ViTs) have shown promising generalization capability. In this paper, we propose the first parameter-efficient tuning approach for deepfake detection, namely DeepFake-Adapter, to effectively and efficiently adapt the generalizable high-level semantics from large pre-trained ViTs to aid deepfake detection. Given large pre-trained models but limited deepfake data, DeepFake-Adapter introduces lightweight yet dedicated dual-level adapter modules to a ViT while keeping the model backbone frozen. Specifically, to guide the adaptation process to be aware of both global and local forgery cues of deepfake data, 1) we not only insert Globally-aware Bottleneck Adapters in parallel to MLP layers of ViT, 2) but also actively cross-attend Locally-aware Spatial Adapters with features from ViT. Unlike existing deepfake detection methods merely focusing on low-level forgery patterns, the forgery detection process of our model can be regularized by generalizable high-level semantics from a pre-trained ViT and adapted by global and local low-level forgeries of deepfake data. Extensive experiments on several standard deepfake detection benchmarks validate the effectiveness of our approach. Notably, DeepFake-Adapter demonstrates a convincing advantage under cross-dataset and cross-manipulation settings. The source code is released at <a class="link-external link-https" href="https://github.com/rshaojimmy/DeepFake-Adapter" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem this paper attempts to address is the insufficient generalization capability of existing deepfake detection methods when faced with unseen or low-quality samples. This is mainly because current methods overfit to low-level forgery features while neglecting the importance of high-level semantics. The authors believe that leveraging high-level semantics from large pre-trained Vision Transformers (ViT) is crucial for improving the generalization capability of forgery detection. Specifically, the paper proposes a parameter-efficient tuning method called DeepFake-Adapter, which effectively applies high-level semantics from large pre-trained ViTs to deepfake detection by introducing lightweight dual-level adapter modules. These adapter modules include the Global-aware Bottleneck Adapter (GBA) and the Local-aware Spatial Adapter (LSA), which are used to capture global and local low-level forgery features, respectively, and interact organically with high-level semantics in the ViT to generate more generalized forgery representations. The main contributions include: 1. **Utilization of High-Level Semantics**: For the first time, adapter technology is introduced into the field of deepfake detection, utilizing high-level semantics from large pre-trained ViTs. 2. **Dual-Level Adapters**: A novel dual-level adapter, DeepFake-Adapter, is proposed, including GBA and LSA, which can effectively adapt to pre-trained ViTs and generate more generalized forgery representations. 3. **Experimental Validation**: Extensive quantitative and qualitative experiments demonstrate the superiority of this method in deepfake detection, especially in cross-dataset and cross-operation settings. Through these innovations, the paper aims to improve the generalization capability and robustness of deepfake detection, making it more effective when faced with different types of forgeries and various post-processing operations.