Adapter Learning from Pre-trained Model for Robust Spoof Speech Detection

Haochen Wu,Wu Guo,Shengyu Peng,Zhuhai Li,Jie Zhang
DOI: https://doi.org/10.21437/interspeech.2024-253
2024-01-01
Abstract:Speech anti-spoofing models can be improved by using large pre-trained model as front-end, e.g., Wav2vec2 or WavLM. However, apart from the heavy computation overhead, fine-tuning of pre-trained model is prone to over-fitting and catastrophic forgetting due to limited training data. In this paper, we propose an novel adapter learning framework based on pre-trained model for robust spoof speech detection. We consider two adapter cases, i.e., intra-block adapters and cross-block adapters, which are inserted or appended to the backbone Wav2vec2. The parameters of the adapters are updated by freezing the backbone during training. The local-global task-dependent information for spoof speech detection is obtained via the proposed adapter learning with a marginal increase of parameters. Results on three benchmark datasets validate the superiority over the baseline and existing SOTA systems.
What problem does this paper attempt to address?