Abstract:In recent years, the abuse of a face swap technique called deepfake has raised enormous public concerns. So far, a large number of deepfake videos (known as "deepfakes") have been crafted and uploaded to the internet, calling for effective countermeasures. One promising countermeasure against deepfakes is deepfake detection. Several deepfake datasets have been released to support the training and testing of deepfake detectors, such as DeepfakeDetection and FaceForensics++. While this has greatly advanced deepfake detection, most of the real videos in these datasets are filmed with a few volunteer actors in limited scenes, and the fake videos are crafted by researchers using a few popular deepfake softwares. Detectors developed on these datasets may become less effective against real-world deepfakes on the internet. To better support detection against real-world deepfakes, in this paper, we introduce a new dataset WildDeepfake which consists of 7,314 face sequences extracted from 707 deepfake videos collected completely from the internet. WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes. We conduct a systematic evaluation of a set of baseline detection networks on both existing and our WildDeepfake datasets, and show that WildDeepfake is indeed a more challenging dataset, where the detection performance can decrease drastically. We also propose two (eg. 2D and 3D) Attention-based Deepfake Detection Networks (ADDNets) to leverage the attention masks on real/fake faces for improved detection. We empirically verify the effectiveness of ADDNets on both existing datasets and WildDeepfake. The dataset is available at: <a class="link-external link-https" href="https://github.com/OpenTAI/wild-deepfake" rel="external noopener nofollow">this https URL</a>.

DeepSpeak Dataset v1.0

FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset

PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset

Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

Towards Understanding of Deepfake Videos in the Wild

KoDF: A Large-scale Korean DeepFake Detection Dataset

Deepfake Videos in the Wild: Analysis and Detection

Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection

WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection

Diffuse or Confuse: A Diffusion Deepfake Speech Dataset

1M-Deepfakes Detection Challenge

Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

I Can Hear You: Selective Robust Training for Deepfake Audio Detection

Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors

AntiDeepFake: AI for Deep Fake Speech Recognition

Deepfacelab: Integrated, Flexible and Extensible Face-Swapping Framework.

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset