Siamese Network with Wav2vec Feature for Spoofing Speech Detection

Yang Xie,Zhenchuan Zhang,Yingchun Yang
DOI: https://doi.org/10.21437/Interspeech.2021-847
2021-01-01
Abstract:Automatic speaker verification is vulnerable to spoofing attacks with synthesized or converted speech. Although high-performance anti-spoofing countermeasures can achieve high accuracy when the training and testing spoofing attack examples are similarly distributed, their performance degrades significantly when confronted with out-of-distribution spoofing speech, which is created by increasingly advanced unseen speech synthesis and voice conversion methods. Since it is unrealistic to collect enough labeled data from each new spoofing attack method, we argue that addressing the problem of out-of-distribution generalization for spoofing speech detection is essential. In this work, we propose a two-phase representation learning system based on a Siamese network for spoofing speech detection tasks. During the representation learning phase, an embedding Siamese neural network is trained with the wav2vec features to distinguish whether the speech samples in a pair belong to the same category. The proposed system decreases the equal error rate from the state-of-the-art result of 4:07% to 1:15% on the ASVspoof 2019 evaluation set.
What problem does this paper attempt to address?