A Timely Survey on Vision Transformer for Deepfake Detection

Zhikan Wang,Zhongyao Cheng,Jiajie Xiong,Xun Xu,Tianrui Li,Bharadwaj Veeravalli,Xulei Yang

2024-05-14

Abstract:In recent years, the rapid advancement of deepfake technology has revolutionized content creation, lowering forgery costs while elevating quality. However, this progress brings forth pressing concerns such as infringements on individual rights, national security threats, and risks to public safety. To counter these challenges, various detection methodologies have emerged, with Vision Transformer (ViT)-based approaches showcasing superior performance in generality and efficiency. This survey presents a timely overview of ViT-based deepfake detection models, categorized into standalone, sequential, and parallel architectures. Furthermore, it succinctly delineates the structure and characteristics of each model. By analyzing existing research and addressing future directions, this survey aims to equip researchers with a nuanced understanding of ViT's pivotal role in deepfake detection, serving as a valuable reference for both academic and practical pursuits in this domain.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: With the rapid development of deepfake technology, the content it generates is becoming more and more realistic, which has brought serious threats to individual rights, national security and public safety. In order to meet these challenges, various detection methods have emerged. Among them, the methods based on Vision Transformer (ViT) perform excellently in terms of universality and efficiency. Therefore, this paper aims to conduct a timely review of the deep - fake detection models based on ViT, so as to help researchers better understand the latest progress in this field and provide guidance for future research directions. Specifically, the main objectives of the paper include: 1. **Provide the latest review**: As of February 28, 2024, the paper has classified and outlined 14 deep - fake detection models based on ViT, and these models are divided into three categories: independent models, sequential models and parallel models. 2. **Analyze existing research**: It introduces in detail the structure and characteristics of each model and explores the unsolved problems in existing research. 3. **Propose future research directions**: By analyzing existing research, it proposes potential future research directions to promote the development of deep - fake detection technology. 4. **Emphasize the advantages of ViT**: It highlights the advantages of ViT in capturing the global dependencies of images, which is very crucial for identifying the subtle differences and features in deep - fake content. Through these objectives, the paper hopes to provide a valuable reference for the academic community and practitioners, helping them better understand and deal with the challenges brought by deepfakes.

A Timely Survey on Vision Transformer for Deepfake Detection

Protego: Detecting Adversarial Examples for Vision Transformers Via Intrinsic Capabilities

Deepfake Video Detection Using Convolutional Vision Transformer

Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis

FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection

ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Deepfake Detection with Deep Learning: Convolutional Neural Networks versus Transformers

Deepfake Detection Using Spatiotemporal Transformer

A Contemporary Survey on Deepfake Detection: Datasets, Algorithms, and Challenges

Adt: anti-deepfake transformer

A Survey of Deepfake Detection Methods: Innovations, Accuracy, and Future Directions

Deep Learning for Deepfakes Creation and Detection: A Survey

Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection

Enhancing General Face Forgery Detection via Vision Transformer with Low-Rank Adaptation

DeepFake detection algorithm based on improved vision transformer

Deep Learning Technology for Face Forgery Detection: A Survey

Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey

Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer

Deepfake Generation and Detection: A Benchmark and Survey

Deepfake Detection Scheme Based on Vision Transformer and Distillation