Abstract:Benefiting from the development of big data, edge computing, and deep learning, splendid breakthroughs have been made in automatic speech recognition (ASR) in recent years. Since then, more and more smart products have chosen speech as the interface for human-computer interaction, which causes popularity of edge intelligence (EI) enhanced automatic speech recognition. While people are enjoying the social changes brought by speech recognition technology, a factor of instability quietly emerged called audio adversarial example which is a type of audio deliberately generated by attackers via adding subtle perturbations to the original audio signal. The added perturbations which sound like certain noise that cannot be precepted by human but will cause ASR system make wrong transcription. Three detection algorithms for audio adversarial examples are proposed in this thesis, namely, the robust detection algorithm based on WER (word error rate), the feature detection algorithm based on ADR (adversarial ratio), and the collaborative detection algorithm based on neural network. The experiment results show that three detection algorithms proposed in this thesis have a great discrimination on audio adversarial examples and achieve high AUC scores. Among them, the cooperative detection is the best and the feature detection is the worst. In addition, we found that robust detection algorithm tends to have a higher accuracy score but a lower recall score, while feature detection algorithm tends to have the converse performance. Moreover, since the proposed collaborative detection method combines the advantages of the robust detection and feature detection methods, it presents a better performance with respect to accuracy, recall, and F1 score.

Defending and Detecting Audio Adversarial Example Using Frame Offsets.

Echo: Reverberation-based Fast Black-Box Adversarial Attacks on Intelligent Audio Systems.

Understanding and Benchmarking the Commonality of Adversarial Examples

Defending Adversarial Attacks on Cloud-aided Automatic Speech Recognition Systems.

Defending against Adversarial Audio via Diffusion Model

The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems

Toward Robust ASR System against Audio Adversarial Examples using Agitated Logit

Towards the Universal Defense for Query-Based Audio Adversarial Attacks

Adversarial Examples Attack and Countermeasure for Speech Recognition System: A Survey.

Adversarial Examples for Automatic Speech Recognition: Attacks and Countermeasures

Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise

Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV

Towards Resistant Audio Adversarial Examples

An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

A Detection Algorithm for Audio Adversarial Examples in EI-Enhanced Automatic Speech Recognition

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

Adversarial Example Attacks Against ASR Systems: an Overview

Robustifying automatic speech recognition by extracting slowly varying features

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Watch Your Speed: Injecting Malicious Voice Commands via Time-Scale Modification