STATNet: Spectral and Temporal features based Multi-Task Network for Audio Spoofing Detection

Mayank Vatsa,Richa Singh,R. Ranjan
DOI: https://doi.org/10.1109/IJCB54206.2022.10007949
2022-10-10
Abstract:With the rise in mobile phone users and VoIP, voice has emerged as an easy and accessible biometric modality for identification or verification tasks. Given the increasing usage of voice biometrics, the security of these systems is also of paramount importance. Researchers have demon-strated that Automatic Speaker Verification (ASV) systems are prone to spoofing attacks like synthetic speech or fake speech, which can be used maliciously for a variety of tasks such as impersonation, fake news spreading, and opinion formation. This research proposes a deep convolution-based multi-task network which performs both spoof detection and source identification for synthetic speech. The pro-posed model is evaluated on three datasets ASVspoof2019 LA, FOR-Norm and In-the- Wild Audio Deepfake dataset. The results demonstrate the EER of 2.456%, 0.814%, and 0.199% on the ASVspoof2019 LA, FOR-Norm, and In-the-Wild Audio Deepfake datasets. In addition, we have also demonstrated results for cross-dataset evaluation and speech source identification.
Computer Science,Engineering
What problem does this paper attempt to address?