Abstract:Automatic speech recognition (ASR) systems are vulnerable to audio adversarial examples, which aim to deceive ASR systems by adding perturbations to benign speech signals. These audio adversarial examples appear indistinguishable from benign audio waves, but the ASR system decodes them as intentional malicious commands. Previous studies have demonstrated the feasibility of such attacks in simulated environments (over-line) and have further showcased the creation of robust physical audio adversarial examples (over-air). Various defense techniques have been proposed to counter these attacks. However, most of them have either failed to handle various types of attacks effectively or have resulted in significant time overhead. In this paper, we propose a novel method for detecting audio adversarial examples. Our approach involves feeding both smoothed audio and original audio inputs into the ASR system. Subsequently, we introduce noise to the logits before providing them to the decoder of the ASR. We demonstrate that carefully selected noise can considerably influence the transcription results of audio adversarial examples while having minimal impact on the transcription of benign audio waves. Leveraging this characteristic, we detect audio adversarial examples by comparing the altered transcription, resulting from logit noising, with the original transcription. The proposed method can be easily applied to ASR systems without requiring any structural modifications or additional training. Experimental results indicate that the proposed method exhibits robustness against both over-line and over-air audio adversarial examples, outperforming state-of-the-art detection methods.

FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation

Understanding and Benchmarking the Commonality of Adversarial Examples

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems

Defending Adversarial Attacks on Cloud-aided Automatic Speech Recognition Systems.

Towards Query-Efficient Adversarial Attacks Against Automatic Speech Recognition Systems

Towards the Universal Defense for Query-Based Audio Adversarial Attacks

Towards Resistant Audio Adversarial Examples

Toward Robust ASR System against Audio Adversarial Examples using Agitated Logit

ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features

Crafting Adversarial Examples For Computational Paralinguistic Applications

Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise

An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples

Targeted Speech Adversarial Example Generation With Generative Adversarial Network

Defending and Detecting Audio Adversarial Example Using Frame Offsets.

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Weighted-Sampling Audio Adversarial Example Attack.

Watch Your Speed: Injecting Malicious Voice Commands via Time-Scale Modification

SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems

TransAudio: Towards the Transferable Adversarial Audio Attack via Learning Contextualized Perturbations

Robustifying automatic speech recognition by extracting slowly varying features