Abstract:Deep neural networks (DNNs) are widely used for image recognition, speech recognition, and other pattern analysis tasks. Despite the success of DNNs, these systems can be exploited by what is termed adversarial examples. An adversarial example, in which a small distortion is added to the input data, can be designed to be misclassified by the DNN while remaining undetected by humans or other systems. Such adversarial examples have been studied mainly in the image domain. Recently, however, studies on adversarial examples have been expanding into the voice domain. For example, when an adversarial example is applied to enemy wiretapping devices (victim classifiers) in a military environment, the enemy device will misinterpret the intended message. In such scenarios, it is necessary that friendly wiretapping devices (protected classifiers) should not be deceived. Therefore, the selective adversarial example concept can be useful in mixed situations, defined as situations in which there is both a classifier to be protected and a classifier to be attacked. In this paper, we propose a selective audio adversarial example with minimum distortion that will be misclassified as the target phrase by a victim classifier but correctly classified as the original phrase by a protected classifier. To generate such examples, a transformation is carried out to minimize the probability of incorrect classification by the protected classifier and that of correct classification by the victim classifier. We conducted experiments targeting the state-of-the-art DeepSpeech voice recognition model using Mozilla Common Voice datasets and the Tensorflow library. They showed that the proposed method can generate a selective audio adversarial example with a 91.67% attack success rate and 85.67% protected classifier accuracy.

Audio Steganography with Speech Recognition System

Echo: Reverberation-based Fast Black-Box Adversarial Attacks on Intelligent Audio Systems.

The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems

Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of Sound

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound

Adversarial Examples Against Deep Neural Network based Steganalysis.

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

Enhancing the Security of Deep Learning Steganography via Adversarial Examples

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Direct Adversarial Attack on Stego Sandwiched Between Black Boxes

Selective Audio Adversarial Example in Evasion Attack on Speech Recognition System

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Adversarial Privacy Protection on Speech Enhancement

Defending Adversarial Attacks on Cloud-aided Automatic Speech Recognition Systems.

SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems

Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise

AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human Perception

Towards Resistant Audio Adversarial Examples

Deep Residual Neural Networks for Image in Speech Steganography

Detection Based Defense Against Adversarial Examples from the Steganalysis Point of View