Abstract:Large vocabulary continuous speech recognition on telephonic conversational Tibetan is firstly addressed in this paper. As a minority language, the major difficulty in Tibetan speech recognition is data deficiency. In this paper, the acoustic model of Tibetan is trained based on deep neural networks (DNN). To address the issue of data deficiencies, the DNN models of other majority languages are used as the initial networks of the objective Tibetan DNN model. In addition, phonetic questions of Tibetan generated by phonetic expert are unavailable due to the lacking knowledge of phonetics. To reduce the number of tri-phone hidden Markov models(HMM) in Tibetan speech recognition, phonetic questions automatically generated in the data driven manner are used for tying the tri-phone HMM. In this paper, different clustering of tri-phone states is tested and the words accuracy is about 30. 86% on the test corpus by Gaussian mixture model ( GMM) . When the acoustic model is trained based on DNN, 3 kinds of DNN model trained by different large corpus are adopted. The experimental results show that the proposed methods can improve the recognition performance, and the words accuracy is about 43. 26% on the test corpus.

Tibetan Language Continuous Speech Recognition Based on Dynamic Bayesian Network

Tibetan Language Continuous Speech Recognition Based On Active Ws-Dbn

Audio-Visual Tibetan Speech Recognition Based On A Deep Dynamic Bayesian Network For Natural Human Robot Interaction Regular Paper

Research on the Algorithm of Tibetan Speech Recognition Based on DBN

Unsupervised Tibetan speech features Learning based on Dynamic Bayesian Networks

International Journal of Advanced Robotic Systems Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction Regular Paper

A Dynamic Bayesian Network Based Framework for Continuous Speech Recognition and Its Token Passing Model

Speech Recognition Based on Deep Neural Networks on Tibetan Corpus

Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode

Mongolian acoustic modeling based on deep neural network

Automatic Speaker Recognition Using Dynamic Bayesian Network.

Deep Neural Network based Uyghur Large Vocabulary Continuous Speech Recognition

Dynamic Bayesian network approach to speaker identification

Dbn Based Multi-Stream Models For Speech

Continuous Speech Recognition for Large Vocabulary Based on Triphone DBN Model

Investigation on Acoustic Modeling with Different Phoneme Set for Continuous Lhasa Tibetan Recognition Based on Dnn Method

Speaker recognition of Yunnan minority accent Based on bayesian network

Tibetan-Mandarin Bilingual Speech Recognition Based on End-to-end Framework

Study on Continuous Speech Recognition based on Bottleneck Features for Lhasa-Tibetan Dialect

Novel Articulatory Feature based Dynamic Bayesian Network model for speech recognition

Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method.