Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method.

Kuntharrgyal Khysru,Yangzhuoma Qie,Haiqiang Shi,Qilong Sun,Jianguo Wei
DOI: https://doi.org/10.1007/978-3-031-06788-4_13
2022-01-01
Abstract:Time-delay neural networks (TDNN) acoustic models have significantly advanced forward in recent years, outperforming the traditional Gaussian Mixture Hidden Markov Model (GMM-HMM) in large vocabulary continuous speech recognition tasks. We try to develop a practical Lhasa Tibetan ASR system. For higher speech recognition accuracy, in this paper, we consider to investigate the performances of Tibetan acoustic modeling using TDNN method based on several different phoneme sets, which are defined based on linguistic and phonological knowledge of Tibetan Lhasa dialect. Experiments are conducted on a Tibetan corpus recorded by 20 persons, using a bigram language model over phones. The phone error rate (PER) results show that the acoustic model with CTL set performs best, which is relatively 10.43% higher accuracy than the basic phoneme set. Moreover, our results confirm the fact that for Lhasa Tibetan acoustic model, the paradigm TDNN-HMM outperforms the conventional GMM-HMM.
What problem does this paper attempt to address?