MFAAN: Unveiling Audio Deepfakes with a Multi-Feature Authenticity Network

Karthik Sivarama Krishnan,Koushik Sivarama Krishnan
DOI: https://doi.org/10.1109/ICSC60394.2023.10441405
2023-11-07
Abstract:In the contemporary digital age, the proliferation of deepfakes presents a formidable challenge to the sanctity of information dissemination. Audio deepfakes, in particular, can be deceptively realistic, posing significant risks in misinformation campaigns. To address this threat, we introduce the Multi-Feature Audio Authenticity Network (MFAAN), an advanced architecture tailored for the detection of fabricated audio content. MFAAN incorporates multiple parallel paths designed to harness the strengths of different audio representations, including Mel-frequency cepstral coefficients (MFCC), linear-frequency cepstral coefficients (LFCC), and Chroma Short Time Fourier Transform (Chroma-STFT). By synergistically fusing these features, MFAAN achieves a nuanced understanding of audio content, facilitating robust differentiation between genuine and manipulated recordings. Preliminary evaluations of MFAAN on two benchmark datasets, 'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset, demonstrate its superior performance, achieving accuracies of 98.93% and 94.47% respectively. Such results not only underscore the efficacy of MFAAN but also highlight its potential as a pivotal tool in the ongoing battle against deepfake audio content.
Sound,Artificial Intelligence,Audio and Speech Processing
What problem does this paper attempt to address?
The problem this paper attempts to address is the challenge of information authenticity posed by audio deepfakes. Specifically, audio deepfakes can very realistically mimic anyone's voice, which poses significant risks in areas such as misinformation dissemination, defamation, extortion, and financial fraud. To counter this threat, the authors propose a Multi-Feature Audio Authenticity Network (MFAAN), aimed at detecting forged audio content by combining multiple audio representation methods. The design concept of MFAAN is to achieve comprehensive analysis of audio content through a multi-path strategy, utilizing features such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral Coefficients (LFCC), and Chroma Short-Time Fourier Transform (Chroma-STFT). This approach not only captures the timbre and spectral characteristics of the audio but also focuses on its harmonic content, thereby enhancing the ability to distinguish between real and fake audio. In the paper, the authors demonstrate the superior performance of MFAAN through preliminary evaluations on two benchmark datasets, achieving accuracy rates of 98.93% and 94.47%, respectively. These results not only prove the effectiveness of MFAAN but also highlight its potential in future efforts to combat audio deepfakes.