Push the Limit of Adversarial Example Attack on Speaker Recognition in Physical Domain

Qianniu Chen,Meng Chen,Li Lu,Jiadi Yu,Yingying Chen,Zhibo Wang,Zhongjie Ba,Feng Lin,Kui Ren
DOI: https://doi.org/10.1145/3560905.3568518
2022-01-01
Abstract:The integration of deep learning on Speaker Recognition (SR) advances its development and wide deployment, but also introduces the emerging threat of adversarial examples. However, only a few existing studies investigate its practical threat in physical domain, which either evaluate its feasibility only by directly replaying generated adversarial examples, or explore the partial channel interference for robustness improvement. In this paper, we propose a physical adversarial example attack, PhyTalker , which could generate and inject perturbations on voices in a live-streaming manner on attacking various SR models in different physical channels. Compared with the typical adversarial example for digital attacks, PhyTalker generates a subphoneme-level perturbation dictionary to decouple the perturbation optimization and injection. Moreover, we introduce the channel augmentation to compensate both device and environmental distortions, as well as model ensemble to improve the perturbation transferability. Finally, PhyTalker recognizes and localizes the latest recorded phoneme to determine the corresponding perturbations for real-time broadcasting. Extensive experiments are conducted with a large-scale corpus in real physical scenarios, and results show that PhyTalker achieves an overall Attack Success Rate (ASR) of 85.5% in attacking mainstream SR systems and Mel Cepstral Distortion (MCD) of 2.45dB in human audibility.
What problem does this paper attempt to address?