Fast speech adversarial example generation for keyword spotting system with conditional GAN

Donghua Wang,Li Dong,Rangding Wang,Diqun Yan
DOI: https://doi.org/10.1016/j.comcom.2021.08.010
IF: 5.047
2021-11-01
Computer Communications
Abstract:Deep network-based keyword spotting (KWS) has embraced great success in many speech assistant applications. However, such network-based KWS systems were demonstrated vulnerable to adversarial attacks. In this work, we propose to utilize a conditional generative adversarial network (CGAN) to efficiently craft targeted speech adversarial examples. Specifically, we first transform the attacking target label into a vector, which is treated as the condition input of CGAN. The generator in CGAN is tasked to generate perturbation that could make the adversarial example misclassified as the pre-specified target keyword, while simultaneously deceiving the discriminator to misclassify the adversarial example as genuine. The discriminator aims to differentiate the crafted adversarial examples from the legitimate samples. Secondly, the target network-based KWS classifier(s) are ensembled and integrated into the proposed CGAN framework to enforce the generator to construct model-independent perturbation. The classification error loss of the target KWS is back-propagated through gradients for guiding the weight update of the generator. Finally, with properly devised network architecture and training procedure, we obtain a well-trained generator that generates the adversarial perturbation for a given speech clip and target label. Experimental results show that the crafted adversarial examples could effectively attack the state-of-the-art KWS system with quite a high attack success rate, while attaining acceptable perception quality.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?