Abstract:Automatic speech recognition (ASR) systems have been shown to be vulnerable to adversarial examples (AEs). Recent success all assumes that users will not notice or disrupt the attack process despite the existence of music/noise-like sounds and spontaneous responses from voice assistants. Nonetheless, in practical user-present scenarios, user awareness may nullify existing attack attempts that launch unexpected sounds or ASR usage. In this paper, we seek to bridge the gap in existing research and extend the attack to user-present scenarios. We propose VRIFLE, an inaudible adversarial perturbation (IAP) attack via ultrasound delivery that can manipulate ASRs as a user speaks. The inherent differences between audible sounds and ultrasounds make IAP delivery face unprecedented challenges such as distortion, noise, and instability. In this regard, we design a novel ultrasonic transformation model to enhance the crafted perturbation to be physically effective and even survive long-distance delivery. We further enable VRIFLE's robustness by adopting a series of augmentation on user and real-world variations during the generation process. In this way, VRIFLE features an effective real-time manipulation of the ASR output from different distances and under any speech of users, with an alter-and-mute strategy that suppresses the impact of user disruption. Our extensive experiments in both digital and physical worlds verify VRIFLE's effectiveness under various configurations, robustness against six kinds of defenses, and universality in a targeted manner. We also show that VRIFLE can be delivered with a portable attack device and even everyday-life loudspeakers.

Adversarial Perturbation Prediction for Real-Time Protection of Speech Privacy

Echo: Reverberation-based Fast Black-Box Adversarial Attacks on Intelligent Audio Systems.

On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection

UniAP: Protecting Speech Privacy with Non-Targeted Universal Adversarial Perturbations

Adversarial speech for voice privacy protection from Personalized Speech generation

Attack on Practical Speaker Verification System Using Universal Adversarial Perturbations

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

Adversarial Privacy Protection on Speech Enhancement

VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation

Mitigating Unauthorized Speech Synthesis for Voice Protection

Defending Adversarial Attacks on Cloud-aided Automatic Speech Recognition Systems.

BypTalker: an Adaptive Adversarial Example Attack to Bypass Prefilter-enabled Speaker Recognition

Universal Adversarial Perturbations for Speech Recognition Systems

Defending Against Adversarial Attacks in Speaker Verification Systems

Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding

Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

Push the Limit of Adversarial Example Attack on Speaker Recognition in Physical Domain

Adversarial Representation Learning for Robust Privacy Preservation in Audio

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

Imperceptible Black-Box Waveform-Level Adversarial Attack Towards Automatic Speaker Recognition

Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine Learning