An improved feature extraction for Hindi language audio impersonation attack detection

Nidhi Chakravarty,Mohit Dua
DOI: https://doi.org/10.1007/s11042-023-18104-9
IF: 2.577
2024-01-25
Multimedia Tools and Applications
Abstract:Audio impersonation attacks offer a substantial risk to voice-based authentication systems and various speech recognition applications. Hence, there is a requirement for robust detection methods to assure system security and dependability. The work in this paper discusses a new approach to improve front-end feature extraction of an audio imitation attack detection system, notably in the context of the Hindi language. The proposed model is implemented in three main steps. Firstly, Gammatone spectrogram, Mel spectrogram, and Acoustic Ternary Pattern Audio Features (TPAF)spectrogram are generated from the recorded audio samples. Secondly, an optimized Residual Network (ResNet27) is employed to capture distinctive characteristics from these spectrograms. Lastly, four different binary classifier algorithms; eXtreme Gradient Boosting (XGboost), Random Forest (RF), K-nearest neighbor (KNN), and Naïve Bayes (NB) are individually applied to the aforementioned three different feature combinations, resulting in a total of twelve distinct systems. All these systems have been evaluated using own created dataset named as Voice Impersonation Corpus in Hindi Language (VIHL) for audio impersonation attack. Also, the evaluation of the proposed models have been carried using ASVspoof 2019 and ASVspoof 2021 datasets for spoof, impersonation, replay and deepfake attacks. The results obtained from the proposed work show that Gammatone spectrogram-ResNet27 combination with XGboost classifier achieved 0.9% Equal Error Rate (EER) for impersonation attack, which surpasses existing techniques in accurately identifying such attacks.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?