Abstract:Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as "speech spoofing". The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system.

STATNet: Spectral and Temporal features based Multi-Task Network for Audio Spoofing Detection

End-to-end Spoofing Speech Detection and Knowledge Distillation under Noisy Conditions

Fast and Lightweight Voice Replay Attack Detection Via Time-frequency Spectrum Difference

Voice Presentation Attack Detection Using Convolutional Neural Networks

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

Securing Voice Biometrics: One-Shot Learning Approach for Audio Deepfake Detection

Audio Spoofing Verification using Deep Convolutional Neural Networks by Transfer Learning

Voice spoofing detection using a neural networks assembly considering spectrograms and mel frequency cepstral coefficients

Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection

Voice Spoofing Countermeasure for Voice Replay Attacks Using Deep Learning

Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification

MelCochleaGram-DeepCNN: Sequentially Fused Spectrogram and the DeepCNN Classifiers-based Audio Spoof Detection System

A blended framework for audio spoof detection with sequential models and bags of auditory bites

One-class Learning Towards Synthetic Voice Spoofing Detection

Bridging the Spoof Gap: A Unified Parallel Aggregation Network for Voice Presentation Attacks

Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection

Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System

Voice spoofing detection with raw waveform based on Dual Path Res2net

Physiological-Physical Feature Fusion for Automatic Voice Spoofing Detection

DeepDet: YAMNet with BottleNeck Attention Module (BAM) TTS synthesis detection

Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects