Abstract:This paper proposes an acoustic modeling approach based on bootstrap and restructuring to dealing with data sparsity for low-resourced languages. The goal of the approach is to improve the statistical reliability of acoustic modeling for automatic speech recognition (ASR) in the context of speed, memory and response latency requirements for real-world applications. In this approach, randomized hidden Markov models (HMMs) estimated from the bootstrapped training data are aggregated for reliable sequence prediction. The aggregation leads to an HMM with superior prediction capability at cost of a substantially larger size. For practical usage the aggregated HMM is restructured by Gaussian clustering followed by model refinement. The restructuring aims at reducing the aggregated HMM to a desirable model size while maintaining its performance close to the original aggregated HMM. To that end, various Gaussian clustering criteria and model refinement algorithms have been investigated in the full covariance model space before the conversion to the diagonal covariance model space in the last stage of the restructuring. Large vocabulary continuous speech recognition (LVCSR) experiments on Pashto and Dari have shown that acoustic models obtained by the proposed approach can yield superior performance over the conventional training procedure with almost the same run-time memory consumption and decoding speed.

PHMM Based Asynchronous Acoustic Model for Chinese Large Vocabulary Continuous Speech Recognition

Context Dependent Syllable Acoustic Model For Continuous Chinese Speech Recognition

Exploiting Prosodic and Lexical Features for Tone Modeling in A Conditional Random Field Framework

Maximum Entropy Based Tone Modeling for Mandarin Speech Recognition

A New Acoustic Modeling of Inter-Syllable Context-Dependent Units for Putonghua Continuous Speech Recognition

Probabilistic Speaker-Class Based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition

STUDY ON FRAMEWORK FOR CHINESE PRONUNCIATION VARIATION MODELING

Product HMM-based Training Method for Acoustic Model with Multiple-Size Units

Research on Context-Dependent Acoustical Unit (Triphone) for Mandarin Continuous Speech Recognition

TONE RECOGNITION OF CHINESE CONTINUOUS SPEECH

Acoustic Modeling Based On Chinese Phonetics Knowledge

Asynchronous F0 and Spectrum Modeling for HMM-based Speech Synthesis

Deep neural networks for syllable based acoustic modeling in Chinese speech recognition.

Hidden Markov Acoustic Modeling with Bootstrap and Restructuring for Low-Resourced Languages

Real Context Model for Tone Recognition in Mandarin Conversational Telephone Speech

Mandarin tone modeling using recurrent neural networks

A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Initial/final acoustic model based on separating nasal coda in Chinese Putonghua speech recognition

Context Dependent Initial/final Acoustic Modeling for Continuous Chinese Speech Recognition