A Multi-task Framework of Speaker Recognition with TTS Data Augmentation

Xingjia Xie,Yiming Zhi,Beibei Ouyang,Qingyang Hong,Lin Li
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980253
2022-01-01
Abstract:Deep learning usually requires a lot of data, but we often have difficulties in collecting enough training data in many fields. In some limited resource application scenarios, data augmentation often plays a key role. A common method is adding background noise to the speech or changing the speed of speech to increase the number of utterances in the training dataset. For the training of ASV model, we propose a method to augment the training dataset through synthesizing large amounts of speech by VAE-based speech synthesis model, and we mitigate the problem of anti-spoofing detection performance degradation caused by the introduction of synthesized speech through the multi-task framework. Experiments on AISHELL-l, AISHELL-3 and ASVspoof2019LA databases show that our proposed method can improve the robustness of the speaker recognition model while also improving the anti-spoofing ability of the model.
What problem does this paper attempt to address?