Abstract:The rapid increase in deepfake technology has raised significant concerns about digital media integrity. Detecting deepfakes is crucial for safeguarding digital media. However, most standard image classifiers fail to distinguish between fake and real faces. Our analysis reveals that this failure is due to the model's inability to explicitly focus on the artefacts typically in deepfakes. We propose an enhanced architecture based on the GenConViT model, which incorporates weighted loss and update augmentation techniques and includes masked eye pretraining. This proposed model improves the F1 score by 1.71% and the accuracy by 4.34% on the Celeb-DF v2 dataset. The source code for our model is available at <a class="link-external link-https" href="https://github.com/Monu-Khicher-1/multi-stage-learning" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problems that this paper attempts to solve are several key challenges in deepfake detection. Specifically, the author points out the following issues: 1. **Limitations of Existing Image Classifiers**: - Existing standard image classifiers are unable to accurately distinguish between real and fake human faces. The main reason is that these models cannot clearly focus on the artefacts commonly found in deepfakes. This leads to poor performance of these models in real - world applications. 2. **Improper Use of Data Augmentation Techniques**: - Many existing deepfake detection methods use standard data augmentation techniques (such as Gaussian noise, random brightness contrast, and sharpening). The fake images generated by these techniques disrupt the ideal detection conditions and affect the performance of the model. 3. **Over - reliance of the Model on Eye Features**: - Deep neural networks focus too much on human eyes as distinguishing features during the learning process. This causes the model to be prone to over - fitting and perform poorly when dealing with other facial features. 4. **Class Imbalance Problem**: - There is a serious class imbalance problem in deepfake detection datasets, that is, the number of fake images far exceeds the number of real images. This imbalance affects the generalization ability of the model, making the model more likely to classify all images as fake. To address these problems, the author proposes a multi - stage method to improve deepfake detection, specifically including: - **Improved Data Augmentation Techniques**: Only use basic augmentation techniques (such as rotation and flipping) to avoid introducing noise. - **Pre - training with Eye - Occluded Data**: By pre - training on a dataset with occluded eyes, the model can learn other facial features. - **Weighted Loss Function**: Introduce a weighted loss function to solve the class imbalance problem and improve the model's ability to recognize real images. Through these improvements, the author's model has a 1.71% increase in the F1 score and a 4.34% increase in accuracy on the Celeb - DF v2 dataset.

Herd Mentality in Augmentation -- Not a Good Idea! A Robust Multi-stage Approach towards Deepfake Detection

Safeguarding Media Integrity: A Hybrid Optimized Deep Feature Fusion Based Deepfake Detection in Videos

Multi-attentional Deepfake Detection

Combating deepfakes: a comprehensive multilayer deepfake video detection framework

A Performance Enhancement of Deepfake Video Detection through the use of a Hybrid CNN Deep Learning Model

Hybrid Deepfake Detection Utilizing MLP and LSTM

Hybrid Transformer Network for Deepfake Detection

Real-Time Advanced Computational Intelligence for Deep Fake Video Detection

Deep fake detection using an optimal deep learning model with multi head attention-based feature extraction scheme

Comparison of Deepfake Detection Techniques through Deep Learning

DeepFake detection algorithm based on improved vision transformer

Auguring Fake Face Images Using Dual Input Convolution Neural Network

Deep fake video/image detection using deep learning

A Multimodal Framework for Deepfake Detection

Multi-feature fusion based face forgery detection with local and global characteristics

Hybrid Deep-Learning Model for Deepfake Detection in Video using Transfer Learning Approach

A defensive attention mechanism to detect deepfake content across multiple modalities

Real-Time Deepfake Video Detection Using Eye Movement Analysis with a Hybrid Deep Learning Approach

An efficient deepfake video detection using robust deep learning

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

Multiclass AI-Generated Deepfake Face Detection Using Patch-Wise Deep Learning Model