Abstract:Large vocabulary continuous speech recognition on telephonic conversational Tibetan is firstly addressed in this paper. As a minority language, the major difficulty in Tibetan speech recognition is data deficiency. In this paper, the acoustic model of Tibetan is trained based on deep neural networks (DNN). To address the issue of data deficiencies, the DNN models of other majority languages are used as the initial networks of the objective Tibetan DNN model. In addition, phonetic questions of Tibetan generated by phonetic expert are unavailable due to the lacking knowledge of phonetics. To reduce the number of tri-phone hidden Markov models(HMM) in Tibetan speech recognition, phonetic questions automatically generated in the data driven manner are used for tying the tri-phone HMM. In this paper, different clustering of tri-phone states is tested and the words accuracy is about 30. 86% on the test corpus by Gaussian mixture model ( GMM) . When the acoustic model is trained based on DNN, 3 kinds of DNN model trained by different large corpus are adopted. The experimental results show that the proposed methods can improve the recognition performance, and the words accuracy is about 43. 26% on the test corpus.

IMUT-MC: a Speech Corpus for Mongolian Speech Recognition

A Mongolian Speech Recognition System Based On Hmm

Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus

MNASR: A Free Speech Corpus for Mongolian Speech Recognition and Accompanied Baselines.

Improving of Acoustic Model for the Mongolian Speech Recognition System

Construction of A Mongolian Telephone Speech Corpus

Assembling Chinese-Mongolian Speech Corpus via Crowdsourcing.

A METHOD TO CONSTRUCT AN ADAPTIVE MONGOLIAN SPEECH ACOUSTIC MODEL

Mongolian acoustic modeling based on deep neural network

M2ASR-MONGO: A Free Mongolian Speech Database and Accompanied Baselines

End-to-End Mongolian Text-to-Speech System

Mongolian Speech Recognition Based on Deep Neural Networks

MC$^2$: Towards Transparent and Culturally-Aware NLP for Minority Languages in China

An undergraduate Mandarin speech database for speaker recognition research

Utilizing Crowdsourcing for the Construction of Chinese-Mongolian Speech Corpus with Evaluation Mechanism

Tibetan-Mandarin Bilingual Speech Recognition Based on End-to-end Framework

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

MC2: Towards Transparent and Culturally-Aware NLP for Minority Languages in China

Speech Recognition Based on Deep Neural Networks on Tibetan Corpus

A Study on Yunnan Dialectal Chinese Speech Recognition

MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset