Spotting adversarial samples for speaker verification by neural vocoders

Haibin Wu,Po-chun Hsu,Ji Gao,Shanshan Zhang,Shen Huang,Jian Kang,Zhiyong Wu,Helen M. Meng,Hung-yi Lee
2021-01-01
Abstract:Automatic speaker verification (ASV), one of the most important technology for biometric identification, has been widely adopted in security-critical applications, including transaction authentication and access control. However, previous work has shown that ASV is seriously vulnerable to recently emerged adversarial attacks, yet effective counter-measures against them are limited. In this paper, we adopt neural vocoders to spot adversarial samples for ASV. We use the neural vocoder to re-synthesize audio and find that the difference between the ASV scores for the original and re-synthesized audio is a good indicator for discrimination between genuine and adversarial samples. This effort is, to the best of our knowledge, among the first to pursue such a technical direction for detecting adversarial samples for ASV, and hence there is a lack of established baselines for comparison. Consequently, we implement the Griffin-Lim algorithm as the detection baseline. The proposed approach achieves effective detection performance that outperforms all the base-lines in all the settings. We also show that the neural vocoder adopted in the detection framework is dataset-independent. Our codes will be made open-source for future works to do fair comparison 1 .
Computer Science
What problem does this paper attempt to address?