Abstract:The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into a significant threat, complicating the task of deepfake detection. Despite the notable success of current deepfake detection methods, which predominantly employ data-driven CNN classification models, these methods exhibit limited generalization capabilities and insufficient robustness against novel data unseen during training. To tackle these challenges, this paper introduces a novel detection framework, ReLAF-Net. This framework employs a restricted self-attention mechanism that applies self-attention to deep CNN features flexibly, facilitating the learning of local relationships and inter-regional dependencies at both fine-grained and global levels. This attention mechanism has a modular design that can be seamlessly integrated into CNN networks to improve overall detection performance. Additionally, we propose an adaptive local frequency feature extraction algorithm that decomposes RGB images into fine-grained frequency domains in a data-driven manner, effectively isolating fake indicators in the frequency space. Moreover, an attention-based channel fusion strategy is developed to amalgamate RGB and frequency information, achieving a comprehensive facial representation. Tested on the high-quality version of the FaceForensics++ dataset, our method attained a detection accuracy of 97.92%, outperforming other approaches. Cross-dataset validation on Celeb-DF, DFDC, and DFD confirms the robust generalizability, offering a new solution for detecting high-quality deepfake videos.

Temporal Localization of Deepfake Audio Based on Self-Supervised Pretraining Models and Transformer Classifier

An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection

Transferring Audio Deepfake Detection Capability Across Languages

Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier

Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss

Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models

An Audio Copy-Move Forgery Localization Model by CNN-Based Spectral Analysis

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection

Self-supervised Transformer for Deepfake Detection

Audio-Visual Contrastive Pre-train for Face Forgery Detection

Efficient Deepfake Audio Detection Using Spectro-Temporal Analysis and Deep Learning

Speaker Recognition-Assisted Robust Audio Deepfake Detection

Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge

I Can Hear You: Selective Robust Training for Deepfake Audio Detection

FTDKD: Frequency-Time Domain Knowledge Distillation for Low-Quality Compressed Audio Deepfake Detection

A lightweight feature extraction technique for deepfake audio detection