A Light CNN with Split Batch Normalization for Spoofed Speech Detection Using Data Augmentation

Haojian Lin,Yang Ai,Zhenhua Ling
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980260
2022-01-01
Abstract:The vulnerability of automatic speaker verification (ASV) is exposed to the threat of rapidly developing speech synthesis and voice conversion techniques. Developing anti-spoofing systems is an urgent need. This paper proposes a novel spoofed speech detection model for better utilizing the augmented data at the training stage. This model adopts a light convolutional neural network (LCNN) with the split batch normalization (SBN) structure to alleviate the issue of data pollution caused by data augmentation. The pre-trained wav2vec 2.0 model is used to extract features from input speech waveforms. Three data augmentation strategies, including audio compression, mixup and channel simulation, are compared in our experiments. Experimental results demonstrate that our proposed method achieves the state-of-the-art equal error rate (ERR) of 0.258% on the ASVspoof2019 LA task. Further analysis also confirms the effectiveness of the pre-trained model for feature extraction, the data augmentation strategies, and our proposed SBNLCNN model on improving the performance of spoofed speech detection.
What problem does this paper attempt to address?