Abstract:In our everyday lives, we communicate with each other using several means and channels of communication, as communication is crucial in the lives of humans. Listening and speaking are the primary forms of communication. For listening and speaking, the human voice is indispensable. Voice communication is the simplest type of communication. The Automatic Speaker Verification (ASV) system verifies users with their voices. These systems are susceptible to voice spoofing attacks - logical and physical access attacks. Recently, there has been a notable development in the detection of these attacks. Attackers use enhanced gadgets to record users’ voices, replay them for the ASV system, and be granted access for harmful purposes. In this work, we propose a secure voice spoofing countermeasure to detect voice replay attacks. We enhanced the ASV system security by building a spoofing countermeasure dependent on the decomposed signals that consist of prominent information. We used two main features— the Gammatone Cepstral Coefficients and Mel-Frequency Cepstral Coefficients— for the audio representation. For the classification of the features, we used Bi-directional Long-Short Term Memory Network in the cloud, a deep learning classifier. We investigated numerous audio features and examined each feature’s capability to obtain the most vital details from the audio for it to be labelled genuine or a spoof speech. Furthermore, we use various machine learning algorithms to illustrate the superiority of our system compared to the traditional classifiers. The results of the experiments were classified according to the parameters of accuracy, precision rate, recall, F1-score, and Equal Error Rate (EER). The results were 97%, 100%, 90.19% and 94.84%, and 2.95%, respectively.

An Application-Oriented Taxonomy on Spoofing, Disguise and Countermeasures in Speaker Recognition

FenceSitter: Black-box, Content-Agnostic, and Synchronization-Free Enrollment-Phase Attacks on Speaker Recognition Systems

Ghost-in-Wave: How Speaker-Irrelative Features Interfere DeepFake Voice Detectors

Fast and Lightweight Voice Replay Attack Detection Via Time-frequency Spectrum Difference

Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches

Learning to Fool the Speaker Recognition

Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects

Voice Spoofing Countermeasure for Voice Replay Attacks Using Deep Learning

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Speaker-Aware Anti-Spoofing

Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System

Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing Detection

PhoneyTalker: an Out-of-the-Box Toolkit for Adversarial Example Attack on Speaker Recognition

Utilizing Speaker Profiles for Impersonation Audio Detection

Voiceprint Mimicry Attack Towards Speaker Verification System in Smart Home

Spoofing Speaker Verification With Voice Style Transfer And Reconstruction Loss

Adversarial Attack and Defense Strategies of Speaker Recognition Systems: A Survey

A blended framework for audio spoof detection with sequential models and bags of auditory bites