Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion

Zhe Ye,Terui Mao,Li Dong,Diqun Yan

DOI: https://doi.org/10.21437/Interspeech.2023-733

2023-06-28

Abstract:Deep speech classification has achieved tremendous success and greatly promoted the emergence of many real-world applications. However, backdoor attacks present a new security threat to it, particularly with untrustworthy third-party platforms, as pre-defined triggers set by the attacker can activate the backdoor. Most of the triggers in existing speech backdoor attacks are sample-agnostic, and even if the triggers are designed to be unnoticeable, they can still be audible. This work explores a backdoor attack that utilizes sample-specific triggers based on voice conversion. Specifically, we adopt a pre-trained voice conversion model to generate the trigger, ensuring that the poisoned samples does not introduce any additional audible noise. Extensive experiments on two speech classification tasks demonstrate the effectiveness of our attack. Furthermore, we analyzed the specific scenarios that activated the proposed backdoor and verified its resistance against fine-tuning.

Sound,Cryptography and Security,Machine Learning,Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the backdoor attack problem in deep - voice classification. Specifically, the paper focuses on how to use voice - conversion techniques to generate sample - specific triggers to carry out backdoor attacks. These triggers can cause the attacked deep - learning model to make incorrect predictions when encountering specific voice features. Different from traditional backdoor attacks, the triggers generated by this attack method are not easily detectable because they are naturally embedded into voice samples through voice - conversion techniques without introducing additional audible noise. This makes the attack more concealed and also increases the difficulty of detecting and defending against such attacks. The main contribution of the paper lies in proposing a new backdoor - attack method based on voice conversion. This method can not only effectively carry out attacks in multiple voice - classification tasks, but also has high concealment and the ability to resist fine - tuning. Verified by experiments, this method can successfully activate the preset backdoor while maintaining the model's accuracy in classifying normal samples, guiding the model to make the incorrect predictions expected by the attacker. In addition, the study also explores the specific scenarios for triggering the backdoor, providing an important reference for understanding and preventing such attacks.

Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion

The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems

Echo: Reverberation-based Fast Black-Box Adversarial Attacks on Intelligent Audio Systems.

UltraBD: Backdoor Attack against Automatic Speaker Verification Systems via Adversarial Ultrasound

EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models

Fast and Lightweight Voice Replay Attack Detection Via Time-frequency Spectrum Difference

VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion

Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion

Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of Sound

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound

Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview

Voice Presentation Attack Detection Using Convolutional Neural Networks

PhoneyTalker: an Out-of-the-Box Toolkit for Adversarial Example Attack on Speaker Recognition

DEFENDING YOUR VOICE: ADVERSARIAL ATTACK ON VOICE CONVERSION

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker Verification Systems

Voice Spoofing Countermeasure for Voice Replay Attacks Using Deep Learning

MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

Adversarial Example Detection by Classification for Deep Speech Recognition

Voiceprint Mimicry Attack Towards Speaker Verification System in Smart Home