CFAD: A Chinese Dataset for Fake Audio Detection

Haoxin Ma,Jiangyan Yi,Chenglong Wang,Xinrui Yan,Jianhua Tao,Tao Wang,Shiming Wang,Ruibo Fu
2023-07-18
Abstract:Fake audio detection is a growing concern and some relevant datasets have been designed for research. However, there is no standard public Chinese dataset under complex <a class="link-external link-http" href="http://conditions.In" rel="external noopener nofollow">this http URL</a> this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (CFAD) for studying more generalized detection methods. Twelve mainstream speech-generation techniques are used to generate fake audio. To simulate the real-life scenarios, three noise datasets are selected for noise adding at five different signal-to-noise ratios, and six codecs are considered for audio transcoding (format conversion). CFAD dataset can be used not only for fake audio detection but also for detecting the algorithms of fake utterances for audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging. The CFAD dataset is publicly available at: <a class="link-external link-https" href="https://zenodo.org/record/8122764" rel="external noopener nofollow">this https URL</a>.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the lack of standard public datasets in the field of Chinese fake audio detection. Specifically: 1. **Limitations of Existing Datasets**: - **Lack of Generality**: Existing fake audio detection methods have poor generalization capabilities when facing unknown types, noise interference, or different codec formats. - **Noise and Codec Conditions**: Fake audio in real-world scenarios often comes with noise or has undergone media codec processing, which existing datasets often overlook. - **Language Limitation**: Most datasets are in English, lacking standard Chinese datasets. 2. **Filling the Gap**: - **Building the CFAD Dataset**: The paper designs a Chinese fake audio detection dataset (CFAD) to study more general detection methods. - **Diversity and Complexity**: The CFAD dataset considers fake audio generated by 12 mainstream voice generation technologies and simulates noise and codec conditions in real-world scenarios. - **Detailed Labels**: Each audio file provides detailed label information, including fake audio type, real source, noise type, signal-to-noise ratio (SNR), and media codec format. 3. **Application Scenarios**: - **Fake Audio Detection**: The CFAD dataset supports generalization research on unknown types and robustness research under mismatched conditions. - **Audio Forensics**: The dataset can also be used for research on identifying fake audio generation algorithms. By constructing the CFAD dataset, the paper hopes to advance the field of fake audio detection, especially in complex and real-world applications.