DeepSpeak Dataset v1.0

Sarah Barrington,Matyas Bohacek,Hany Farid
2024-08-31
Abstract:We describe a large-scale dataset--DeepSpeak--of real and deepfake footage of people talking and gesturing in front of their webcams. The real videos in this first version of the dataset consist of 17 hours of footage from 220 diverse individuals. Constituting more than 26 hours of footage, the fake videos consist of a range of different state-of-the-art face-swap and lip-sync deepfakes with natural and AI-generated voices. We expect to release future versions of this dataset with different and updated deepfake technologies. This dataset is made freely available for research and non-commercial uses; requests for commercial use will be considered.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main objective of this paper is to create a large-scale dataset—DeepSpeak, aimed at supporting the media forensics research community and promoting the development and improvement of techniques for detecting deepfake audio, images, and videos. Specifically, the dataset includes real and deepfake videos, where the real videos come from 220 different participants with a total duration of 17 hours; the deepfake videos include those generated by various face-swapping and lip-syncing techniques, with a total duration of over 26 hours. The paper describes the detailed collection process of the dataset, including how the real videos were recorded and how various types of deepfake videos were generated. Additionally, the paper discusses future plans for dataset updates to keep up with the latest developments in deepfake technology. Finally, the paper emphasizes the importance of sharing the dataset and welcomes feedback and suggestions from the community.