Abstract:Automatic speaker verification (ASV) systems have been widely applied in voice user interfaces to conduct person identification and access control via voiceprints. A typical ASV system consists of three stages, i.e., training, enrollment, and verification. Previous work has revealed that the ASV system can be bypassed at the training stage by backdoor attacks and at the verification stage by adversarial example attacks. In this paper, we propose a new type of backdoor attack aimed at the enrollment stage via adversarial ultrasound, named UltraBD, which is highly imperceptible, synchronization-free, and content-independent. By simultaneously injecting the ultrasound backdoor examples when the legitimate user initiates the enrollment, the polluted voiceprints stored in the ASV systems grant access to both the legitimate user and the adversary with relatively high confidence. Despite the challenges, i.e., when, what, and how the legitimate user articulates at the enrollment stage can be remarkably unpredictable and various, we managed to launch UltraBD by augmenting the generation and optimization process of the ultrasound backdoor examples with the randomness of synchronous time and relative amplitude ratio. Furthermore, we optimize the modulation mechanism of adversarial ultrasound by tuning the baseband signal on limited signal frequency points to improve its robustness in the physical world setting. We validate UltraBD on two common datasets together with two open-source ASV models. Results show that UltraBD can be robust to various configurations, e.g., different speakers and utterance content. In sum, our attack calls attention to a new attack surface of ASV systems and sheds light on its fundamental mechanisms.

Spotting adversarial samples for speaker verification by neural vocoders

Adversarial Sample Detection for Speaker Verification by Neural Vocoders

UltraBD: Backdoor Attack against Automatic Speaker Verification Systems via Adversarial Ultrasound

Neural Codec-based Adversarial Sample Detection for Speaker Verification

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

Voice Presentation Attack Detection Using Convolutional Neural Networks

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

Voting for the right answer: Adversarial defense for speaker verification

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification

Voice spoofing detection using a neural networks assembly considering spectrograms and mel frequency cepstral coefficients

Adversarial Voice Conversion Against Neural Spoofing Detectors.

Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems

ADVERSARIAL DEFENSE FOR AUTOMATIC SPEAKER VERIFICATION BY CASCADED SELF-SUPERVISED LEARNING MODELS

Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

One-class Learning Towards Synthetic Voice Spoofing Detection