Syllable-based Speech Recognition for a Very Low-Resource Language, Chaha

Tessfu Geteye Fantaye,Junqing Yu,Tulu Tilahun Hailu
DOI: https://doi.org/10.1145/3377713.3377794
2019-01-01
Abstract:Chaha is a very low-resource language, which is suffered from lack of language resources to develop human language technologies, namely, speech recognition. Moreover, Chaha writing system is syllabic with a consonant-vowel (CV) syllable structure. The Chaha orthography is a one-to-one correspondence with syllable sound units. By considering the above facts of Chaha, this study is the first endeavor that explores the use of CV syllables as acoustic modeling units for developing speech recognizers, using the Gaussian mixture model (GMM) and unilingual and transfer learning deep neural network (DNN) models. Our experimental results demonstrate that the syllablebased unilingual DNN and transfer learning DNN models outperform the corresponding GMM and unilingual DNN models with absolute performance improvements of 2.8 to 3.09% and 1.07 to 4.94%, respectively. The best performing syllable-based recognizer is achieved using a shared hidden layer (SHL) time delay deep neural network (TDNN) model with a word error rate (WER) of 23.11%. Hence, the CV syllables are suitable acoustic units to develop Chaha speech recognition systems under sufficient training corpus.
What problem does this paper attempt to address?