Mongolian acoustic modeling based on deep neural network

Zhiqiang MA,Tuya LI,Shuangtao YANG,Li ZHANG
DOI: https://doi.org/10.11992/tis.201710029
2018-01-01
Abstract:Considering the difficulty of using the Gaussian mixture model (GMM) to adequately describe the correlation and independence hypothesis of the Mongolian acoustic features in the acoustic modeling of Mongolian speech recognition,this study investigates an acoustic model based on deep neural network (DNN).Firstly,using DNN,the internal structure of phonetic features were classified and learned to extract the Mongolian acoustic features,and a DNNHMM Mongolian acoustic model was constructed.Secondly,a training algorithm was designed by combining unsupervised pre-training and supervised training tuning.In addition,dropout technology was added into the DNN-HMM Mongolian acoustic model training to avoid the over-fitting phenomenon.Finally,a comparative experiment was conducted for the GMM-HMM and DNN-HMM Mongolian acoustic models on basis of the small-scale corpus and Kaldi experimental platform.Experimental results show that the word recognition error rate of DNN-HMM Mongolian model was reduced by 7.5% and sentence recognition error rate was reduced by 13.63%.In addition,the over-fitting of DNN-HMM Mongolian acoustic model can be effectively avoided by adopting the dropout technique during training.
What problem does this paper attempt to address?