Abstract:Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrographic features extracted from an audible frequency range of voice commands. However, they often have high error rates and/or long delays. In this paper, we explore a new direction of human voice research by scrutinizing the unique characteristics of human speech at the ultrasound frequency band. Our research indicates that the high-frequency ultrasound components (e.g. speech fricatives) from 20 to 48 kHz can significantly enhance the security and accuracy of speaker verification. We propose a speaker verification system, SUPERVOICE that uses a two-stream DNN architecture with a feature fusion mechanism to generate distinctive speaker models. To test the system, we create a speech dataset with 12 hours of audio (8,950 voice samples) from 127 participants. In addition, we create a second spoofed voice dataset to evaluate its security. In order to balance between controlled recordings and real-world applications, the audio recordings are collected from two quiet rooms by 8 different recording devices, including 7 smartphones and an ultrasound microphone. Our evaluation shows that SUPERVOICE achieves 0.58% equal error rate in the speaker verification task, it only takes 120 ms for testing an incoming utterance, outperforming all existing speaker verification systems. Moreover, within 91 ms processing time, SUPERVOICE achieves 0% equal error rate in detecting replay attacks launched by 5 different loudspeakers.

Stop Deceiving! an Effective Defense Scheme Against Voice Impersonation Attacks on Smart Devices

Fast and Lightweight Voice Replay Attack Detection Via Time-frequency Spectrum Difference

UltraBD: Backdoor Attack against Automatic Speaker Verification Systems via Adversarial Ultrasound

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

Voice Spoofing Countermeasure for Voice Replay Attacks Using Deep Learning

Voice Presentation Attack Detection Using Convolutional Neural Networks

You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones

Recognizing Voice Spoofing Attacks Via Acoustic Nonlinearity Dissection for Mobile Devices

Voiceprint Mimicry Attack Towards Speaker Verification System in Smart Home

One-class Learning Towards Synthetic Voice Spoofing Detection

VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

When Automatic Voice Disguise Meets Automatic Speaker Verification

The defender's perspective on automatic speaker verification: An overview

Protecting Voice Controlled Systems Using Sound Source Identification Based on Acoustic Cues

To what extent can ASV systems naturally defend against spoofing attacks?

Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward

Defending Adversarial Attacks on Cloud-aided Automatic Speech Recognition Systems.

Defend Data Poisoning Attacks on Voice Authentication

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems

SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech