Finer-grained Modeling Units-Based Meta-Learning for Low-resource Tibetan Speech Recognition

Siqing Qin,Longbiao Wang,Sheng Li,Yuqin Lin,Jianwu Dang
DOI: https://doi.org/10.21437/interspeech.2022-10015
2022-01-01
Abstract:Tibetan is a typical under-resourced language due to its relatively smaller population. Although a character-based end-to-end (E2E) automatic speech recognition (ASR) model with transfer learning and multilingual training strategies has mit-igated the problem of low resources, it often meets overfitting problem. Recently meta-learning performs great in solving overfitting problem. However, the widely-used coarse-grained modeling units are not significantly correlated to their pronunciation, which limits the performance improvement of the low-resource ASR system. Furthermore, meta-learning consists of a meta-training period and fast self-adaption on the target language, and the past meta-training period is lack target language-specific information. Therefore, we propose a novel E2E low-resource Lhasa dialect ASR model based on the finer-grained modeling units and transfer learning with reference to the prop-erties of Chinese Pinyin. Chinese Pinyin and Tibetan decom-posed radicals are more related to pronunciation than characters are, which can compensate for more acoustic information in low-resource situations. Furthermore, Tibetan modeling units are utilized in both meta-training and fast self-adaption processes to offer language-specific information to solve the low-resource problem. Experiments show that our proposed method achieves a 54.9% relative character error reduction rate than the baseline system.
What problem does this paper attempt to address?