Abstract:An automated speaker verification system uses the process of speech recognition to verify the identity of a user and block illicit access. Logical access attacks are efforts to obtain access to a system by tampering with its algorithms or data, or by circumventing security mechanisms. DeepFake attacks are a form of logical access threats that employs artificial intelligence to produce highly realistic audio clips of human voice, that may be used to circumvent vocal authentication systems. This paper presents a framework for the detection of Logical Access and DeepFake audio spoofings by integrating audio file components and time-frequency representation spectrograms into a lower-dimensional space using sequential prediction models. Bidirectional-LSTM trained on the bonafide class generates significant one-dimensional features for both classes. The feature set is then standardized to a fixed set using a novel Bags of Auditory Bites (BoAB) feature standardizing algorithm. The Extreme Learning Machine maps the feature space to predictions that differentiate between genuine and spoofed speeches. The framework is evaluated using the ASVspoof 2021 dataset, a comprehensive collection of audio recordings designed for evaluating the strength of speaker verification systems against spoofing attacks. It achieves favorable results on synthesized DeepFake attacks with an Equal Error Rate (EER) of 1.18% in the most optimal setting. Logical Access attacks were more challenging to detect at an EER of 12.22%. Compared to the state-of-the-arts in the ASVspoof2021 dataset, the proposed method notably improves EER for DeepFake attacks by an improvement rate of 95.16%.

Robust Audio Anti-Spoofing System Based on Low-Frequency Sub-Band Information

End-to-end Spoofing Speech Detection and Knowledge Distillation under Noisy Conditions

Ghost-in-Wave: How Speaker-Irrelative Features Interfere DeepFake Voice Detectors

Fast and Lightweight Voice Replay Attack Detection Via Time-frequency Spectrum Difference

Audio Anti-Spoofing Detection: A Survey

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

Source Tracing of Audio Deepfake Systems

A blended framework for audio spoof detection with sequential models and bags of auditory bites

Acoustic features analysis for explainable machine learning-based audio spoofing detection

Adaptive Fake Audio Detection with Low-Rank Model Squeezing

Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection

A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing Detection

Fully Automated End-to-End Fake Audio Detection.

Transferring Audio Deepfake Detection Capability Across Languages

Waveform Boundary Detection for Partially Spoofed Audio

Speaker Recognition-Assisted Robust Audio Deepfake Detection

ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild

STATNet: Spectral and Temporal features based Multi-Task Network for Audio Spoofing Detection

A Comparative Study on Physical and Perceptual Features for Deepfake Audio Detection

A lightweight feature extraction technique for deepfake audio detection