Abstract:Automatic speaker verification (ASV) systems have been widely applied in voice user interfaces to conduct person identification and access control via voiceprints. A typical ASV system consists of three stages, i.e., training, enrollment, and verification. Previous work has revealed that the ASV system can be bypassed at the training stage by backdoor attacks and at the verification stage by adversarial example attacks. In this paper, we propose a new type of backdoor attack aimed at the enrollment stage via adversarial ultrasound, named UltraBD, which is highly imperceptible, synchronization-free, and content-independent. By simultaneously injecting the ultrasound backdoor examples when the legitimate user initiates the enrollment, the polluted voiceprints stored in the ASV systems grant access to both the legitimate user and the adversary with relatively high confidence. Despite the challenges, i.e., when, what, and how the legitimate user articulates at the enrollment stage can be remarkably unpredictable and various, we managed to launch UltraBD by augmenting the generation and optimization process of the ultrasound backdoor examples with the randomness of synchronous time and relative amplitude ratio. Furthermore, we optimize the modulation mechanism of adversarial ultrasound by tuning the baseband signal on limited signal frequency points to improve its robustness in the physical world setting. We validate UltraBD on two common datasets together with two open-source ASV models. Results show that UltraBD can be robust to various configurations, e.g., different speakers and utterance content. In sum, our attack calls attention to a new attack surface of ASV systems and sheds light on its fundamental mechanisms.

Learning Normality is Enough: A Software-based Mitigation against Inaudible Voice Attacks

Echo: Reverberation-based Fast Black-Box Adversarial Attacks on Intelligent Audio Systems.

The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems

Fast and Lightweight Voice Replay Attack Detection Via Time-frequency Spectrum Difference

EarArray: Defending Against DolphinAttack Via Acoustic Attenuation

End-to-end Spoofing Speech Detection and Knowledge Distillation under Noisy Conditions

Understanding and Benchmarking the Commonality of Adversarial Examples

UltraBD: Backdoor Attack against Automatic Speaker Verification Systems via Adversarial Ultrasound

VoiceListener

Voice Presentation Attack Detection Using Convolutional Neural Networks

Detecting Inaudible Voice Commands via Acoustic Attenuation by Multi-channel Microphones

Recognizing Voice Spoofing Attacks Via Acoustic Nonlinearity Dissection for Mobile Devices

Adversarial Agents For Attacking Inaudible Voice Activated Devices

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Stop Deceiving! an Effective Defense Scheme Against Voice Impersonation Attacks on Smart Devices

Defending Against Adversarial Attacks in Speaker Verification Systems

The Feasibility of Injecting Inaudible Voice Commands to Voice Assistants

One-class Learning Towards Synthetic Voice Spoofing Detection

Indelible “footprints” of Inaudible Command Injection

Room-scale Voice Liveness Detection for Smart Devices

Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time