Audio Deepfake Detection: A Survey

Jiangyan Yi,Chenglong Wang,Jianhua Tao,Xiaohui Zhang,Chu Yuan Zhang,Yan Zhao
2023-08-29
Abstract:Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of systematic reviews in the field of Audio Deepfake Detection. Although there are already some research literatures on deepfake detection, most of these studies focus on specific aspects, such as spoofing attacks on Automatic Speaker Verification (ASV) systems and their countermeasures, and lack a comprehensive review of the development of audio deepfake detection techniques. Therefore, this survey report aims to provide a systematic overview, covering various types of audio deepfakes, competitions, data sets, feature extraction, classification methods, and evaluation metrics, etc., and to compare the performance of representative features and classifiers on different data sets through unified experimental analysis. In addition, the report also points out the key problems that need to be solved in future research, such as the lack of large - scale real - scene data sets, the poor generalization ability of existing detection methods to unknown spoofing attacks, and the interpretability of detection results.