SUETA: Speaker-specific Utterance Ensemble Based Transfer Attack on Speaker Identification System

Chu-Xiao Zuo,Jia-Yi Leng,Wu -Jun Li
DOI: https://doi.org/10.1016/j.cose.2024.103948
2022-01-01
Abstract:While speaker identification (SI) systems based on deep neural network (DNN) have been widely applied in security-related practical tasks, more and more attention has been attracted to the robustness of SI systems against potential malicious threats. Existing works have shown that white-box attacks can greatly threaten the current SI systems, but white-box attacks require complete knowledge of the target model, which is almost impractical in many applications. As far as we know, only a few works have studied the more practical black-box attacks, while these attacks are mostly ported from computer vision task and lack the adaptability to speech data. In this work, we propose a novel black-box attack, called speaker-specific utterance ensemble based transfer attack (SUETA). SUETA utilizes the unique characteristic of speech data that different utterances of one specific speaker share the same voiceprint to attack on SI systems. To the best of our knowledge, SUETA is the first black-box attack on SI systems that utilizes the unique characteristic of speech data. Experimental results on three representative SI models show that SUETA can achieve better transfer success rate (TSR) than speaker-unrelated baselines. Furthermore, SUETA can even improve the attack success rate (ASR) of white-box attacks on local substitute model, which is the first step to perform the transfer based black-box attack.
What problem does this paper attempt to address?