Abstract:The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious manipulation of content. This has led to an increase in studies aimed at detecting so-called deepfake audio. However, existing works focus on the binary detection of real audio and fake audio. In real-world scenarios such as model copyright protection and digital evidence forensics, binary classification alone is insufficient. It is essential to identify the source of deepfake audio. Therefore, audio deepfake attribution has emerged as a new challenge. To this end, we designed the first deepfake audio dataset for the attribution of audio generation tools, called Audio Deepfake Attribution (ADA), and conducted a comprehensive investigation on system fingerprints. To address the challenges of attribution of continuously emerging unknown audio generation tools in the real world, we propose the Class-Representation Multi-Center Learning (CRML) method for open-set audio deepfake attribution (OSADA). CRML enhances the global directional variation of representations, ensuring the learning of discriminative representations with strong intra-class similarity and inter-class discrepancy among known classes. Finally, the strong class discrimination capability learned from known classes is extended to both known and unknown classes. Experimental results demonstrate that the CRML method effectively addresses open-set risks in real-world scenarios. The dataset is publicly available at: <a class="link-external link-https" href="https://zenodo.org/records/13318702" rel="external noopener nofollow">this https URL</a>, and <a class="link-external link-https" href="https://zenodo.org/records/13340666" rel="external noopener nofollow">this https URL</a>.

System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

Ghost-in-Wave: How Speaker-Irrelative Features Interfere DeepFake Voice Detectors

Audio Deepfake Attribution: An Initial Dataset and Investigation

Transferring Audio Deepfake Detection Capability Across Languages

Audio Deepfake Detection: A Survey

Speaker Recognition-Assisted Robust Audio Deepfake Detection

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

FakeSound: Deepfake General Audio Detection

I Can Hear You: Selective Robust Training for Deepfake Audio Detection

ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems

CFAD: A Chinese Dataset for Fake Audio Detection

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Does Audio Deepfake Detection Generalize?

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Source Tracing of Audio Deepfake Systems

Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

SafeEar: Content Privacy-Preserving Audio Deepfake Detection

EmoFake: An Initial Dataset for Emotion Fake Audio Detection

Cross-Domain Audio Deepfake Detection: Dataset and Analysis