Abstract:The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manipulated intervals in partially fake audio and determining the source responsible for generating any fake audio, both with real-life implications, notably in audio forensics, law enforcement, and construction of reliable and trustworthy evidence. To further foster research in this area, in this article, we describe the dataset that was used in the fake game, manipulation region location and deepfake algorithm recognition tracks of the challenge. We also focus on the analysis of the technical methodologies by the top-performing participants in each task and note the commonalities and differences in their approaches. Finally, we discuss the current technical limitations as identified through the technical analysis, and provide a roadmap for future research directions. The dataset is available for download at <a class="link-external link-http" href="http://addchallenge.cn/downloadADD2023" rel="external noopener nofollow">this http URL</a>.

What problem does this paper attempt to address?

The main goal of this paper is to address several key issues in audio deepfake detection and to advance the related technologies. Specifically: 1. **Multi-task Challenge Design**: The paper introduces the ADD 2023 challenge, which aims to go beyond traditional binary classification (real/fake) methods to further enhance the capability of audio deepfake detection. The challenge includes three main tracks: - **Audio Forgery Game (FG)**: Divided into generation tasks (FG-G) and detection tasks (FG-D), simulating the game process of attack and defense. - **Manipulation Region Localization (RL)**: Identifying the specific time segments that have been tampered with in partially forged audio. - **Deepfake Algorithm Recognition (AR)**: Determining the algorithm used to generate specific forged audio. 2. **Dataset Design**: To better simulate real-world scenarios, the paper details the datasets used for each track. These datasets include not only real speech samples but also various forged audio samples, covering different generation techniques and environmental conditions. For example: - **Generation Task (FG-G)** uses the AISHELL-3 dataset for training, ensuring high-quality and realistic generated audio. - **Detection Task (FG-D)** includes forged audio samples from various generation models (such as HiFiGAN, LPCNet, etc.). - **Manipulation Region Localization (RL)** dataset simulates partial tampering by splicing real recordings or forged audio. - **Deepfake Algorithm Recognition (AR)** dataset includes audio samples generated by known and unknown algorithms to test the model's recognition ability under different conditions. 3. **Evaluation Metrics**: The paper also defines evaluation metrics for each track, such as Deception Success Rate (DSR), Equal Error Rate (EER), Sentence-level Accuracy (As), and Segment-level F1 Score (F1s), to comprehensively assess the performance of the participating systems. In summary, this paper aims to advance the technology of audio deepfake detection by proposing a series of challenging tasks and carefully designed datasets, making it more reliable and effective in practical applications.

ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

ADD 2023: the Second Audio Deepfake Detection Challenge

ADD 2022: the First Audio Deep Synthesis Detection Challenge

Transferring Audio Deepfake Detection Capability Across Languages

Audio Deepfake Detection: A Survey

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Cross-Domain Audio Deepfake Detection: Dataset and Analysis

TranssionADD: A multi-frame reinforcement based sequence tagging model for audio deepfake detection

Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion

CFAD: A Chinese Dataset for Fake Audio Detection

FakeSound: Deepfake General Audio Detection

A robust audio deepfake detection system via multi-view feature

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia

Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

Audio Deepfake Attribution: An Initial Dataset and Investigation

Efficient Deepfake Audio Detection Using Spectro-Temporal Analysis and Deep Learning

Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Video and Audio Deepfake Datasets and Open Issues in Deepfake Technology: Being Ahead of the Curve

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection