Abstract:Automatic Speaker Recognition Systems (SRSs) have been widely used in voice applications for personal identification and access control. A typical SRS consists of three stages, i.e., training, enrollment, and recognition. Previous work has revealed that SRSs can be bypassed by backdoor attacks at the training stage or by adversarial example attacks at the recognition stage. In this paper, we propose Tuner, a new type of backdoor attack against the enrollment stage of SRS via adversarial ultrasound modulation, which is inaudible, synchronization-free, content-independent, and black-box. Our key idea is to first inject the backdoor into the SRS with modulated ultrasound when a legitimate user initiates the enrollment, and afterward, the polluted SRS will grant access to both the legitimate user and the adversary with high confidence. Our attack faces a major challenge of unpredictable user articulation at the enrollment stage. To overcome this challenge, we generate the ultrasonic backdoor by augmenting the optimization process with random speech content, vocalizing time, and volume of the user. Furthermore, to achieve real-world robustness, we improve the ultrasonic signal over traditional methods using sparse frequency points, pre-compensation, and single-sideband (SSB) modulation. We extensively evaluate Tuner on two common datasets and seven representative SRS models, as well as its robustness against seven kinds of defenses. Results show that our attack can successfully bypass speaker recognition systems while remaining effective to various speakers, speech content, etc. To mitigate this newly discovered threat, we also provide discussions on potential countermeasures, limitations, and future works of this new threat.

Push the Limit of Adversarial Example Attack on Speaker Recognition in Physical Domain

Echo: Reverberation-based Fast Black-Box Adversarial Attacks on Intelligent Audio Systems.

Remote Attacks on Speech Recognition Systems Using Sound from Power Supply

Understanding and Benchmarking the Commonality of Adversarial Examples

The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems

UltraBD: Backdoor Attack against Automatic Speaker Verification Systems via Adversarial Ultrasound

BypTalker: an Adaptive Adversarial Example Attack to Bypass Prefilter-enabled Speaker Recognition

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

PhoneyTalker: an Out-of-the-Box Toolkit for Adversarial Example Attack on Speaker Recognition

Robust Audio Adversarial Example for a Physical Attack

Attack on Practical Speaker Verification System Using Universal Adversarial Perturbations

Imperceptible Black-Box Waveform-Level Adversarial Attack Towards Automatic Speaker Recognition

Defending Adversarial Attacks on Cloud-aided Automatic Speech Recognition Systems.

Parrot-Trained Adversarial Examples: Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models

Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

Adversarial Examples for Automatic Speech Recognition: Attacks and Countermeasures

Adversarial Attack and Defense Strategies of Speaker Recognition Systems: A Survey

Towards Resistant Audio Adversarial Examples

Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound

Defending Against Adversarial Attacks in Speaker Verification Systems