Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition

Zhilong Zhang,Wei Wang,Yanmin Qian
DOI: https://doi.org/10.21437/interspeech.2023-1630
2023-01-01
Abstract:Recent advances in self-supervised learning (SSL) have remarkably improved speech recognition performance for low-resource languages. On the other hand, with data of an increasingly larger scale required for SSL, the pre-training process has become extremely time-consuming. To address this problem, we propose an unsupervised data selection method based on utterance-level language similarity and a curriculum learning strategy to boost the efficiency of multilingual SSL pre-training while maintaining performance. We conduct experiments on five languages in COMMONVOICE dataset. Compared to the baseline with all data for pretraining, we pretrained on only 25% of the data and saved 60% of the training steps with even better performance on the target low-resource language.
What problem does this paper attempt to address?