Investigation on Acoustic Modeling with Different Phoneme Set for Continuous Lhasa Tibetan Recognition Based on Dnn Method
Hongcui Wang,Kuntharrgyal Khyuru,Jian Li,Guanyu Li,Jianwu Dang,Lixia Huang
DOI: https://doi.org/10.1109/apsipa.2016.7820795
2016-01-01
Abstract:Deep neural network (DNN) acoustic models have significantly advanced forward in recent years, outperforming the traditional Gaussian Mixture Hidden Markov Model (GMM-HMM) in large vocabulary continuous speech recognition tasks. We try to develop a practical Lhasa Tibetan ASR system. For higher speech recognition accuracy, in this paper, we consider to investigate the performances of Tibetan acoustic modeling using DNN method based on several different phoneme sets, which are defined based on linguistic and phonological knowledge of Tibetan Lhasa dialect. Experiments are conducted on a Tibetan corpus recorded by 20 persons, using a bigram language model over phones. The phone error rate (PER) results show that the acoustic model with CTL set performs best, which is relatively 10.43% higher accuracy than the basic phoneme set. Moreover, our results confirm the fact that for Lhasa Tibetan acoustic model, the paradigm DNN-HMM outperforms the conventional GMM-HMM.