Masking Speech Feature to Detect Adversarial Examples for Speaker Verification

Xing Chen,Jiadi Yao,Xiao-Lei Zhang
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980334
2022-01-01
Abstract:Adversarial examples of speaker verification (SV) systems are the clean audio recordings added with imperceptible perturbation. They are generated to manipulate the decision of SV, which poses a serious threat to the security of SV. Therefore, many adversarial example detection methods have been proposed to defend against such adversarial attacks. However, existing methods either require additional training of detection models or are time-consuming. In this paper, we propose a non-training and effective method to detect adversarial examples. It simply masks the parts of the input speech features (e.g. LogFBank) that contain less speaker information. The masked parts will inevitably have a small impact on genuine examples, and large impact on adversarial examples. Therefore, the adversarial examples can be detected by analyzing the absolute alteration of scores before and after masking. Experimental results on ResNet34 showed that our method outperforms the training-dependent Parallel-Wave-GAN baseline, and only consumes 1/10 of the detection time of the baseline.
What problem does this paper attempt to address?