Abstract:Tone study is very important for Mandarin speech recognition. In this paper, a Mixture Stochastic Polynomial Tone Model (MSPTM) is proposed for tone modeling in continuous Mandarin speech. In this model the pitch contour, main representative of tone pattern, is described as a mixed stochastic trajectory. The mean trajectory is represented by a polynomial function of normalized time while the variance is time varying. Effective training and tone recognition algorithms were developed. The experimental results based on the proposed MSPTM showed 40.7% tone recognition error rate reduction relative to the traditional Hidden Markov Model (HMM) tone model. We also present a decision tree based approach to learning the tone pattern variation in continuous speech. The phonetic and linguistic factors that may affect the tone patterns were taken into consideration while constructing the tree. After the tree was established, 28 different tone patterns were obtained. We found that in addition to the tone of the neighboring syllable, Consonant/Vowel type of the syllable and the position of the syllable in the utterance also made important contributions to tone pattern variations in continuous speech. Finally, a new approach of integrating tone information into the search process at word level is discussed. Experiments on continuous Mandarin speech recognition showed that the new tone model and tone information integration method were efficient, achieving a 16.2% relative character error rate reduction.

Discriminative Incorporation of Explicitly Trained Tone Models into Lattice Based Rescoring for Mandarin Speech Recognition

Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models

Discriminative Tone Model Training and Optimal Integration for Mandarin Speech Recognition

Tone Model Integration Based on Discriminative Weight Training for Putonghua Speech Recognition

Improved Speech Recognition Using Discriminative Integration of Multiple Local Classifiers in Lattice Rescoring

Tone modeling based on hidden conditional random fields and discriminative model weight training

Discriminative Combination of Multiple Local Classifiers in Lattice Rescoring

Tone Modeling Based on Discriminative Training for Mandarin Speech Recognition

Refining Context-Dependent Tonal Acoustic Modeling in Mandarin LVCSR

Exploiting Prosodic and Lexical Features for Tone Modeling in A Conditional Random Field Framework

Main vowel domain tone modeling with lexical and prosodic analysis for Mandarin ASR

Tone Model Integration Using Tree Based Weight Parameter Tying in Mandarin Speech Recognition

Competing Model Based Tone Evaluation for Mandarin Speech

Syllable-Based Acoustic Modeling with Lattice-Free MMI for Mandarin Speech Recognition

A Multi-Space Distribution (MSD) Approach to Speech Recognition of Tonal Languages

Tone Modeling for Continuous Mandarin Speech Recognition

Mixed Models Based Pronunciation Evaluation of Mandarin Tone.

Automatic Context Induction for Tone Model Integration in Mandarin Speech Recognition

Integrated Tone Evaluation in Mandarin CALL Systems Using Competing Model Based Approach

Tone pronunciation quality scoring of Mandarin multi-syllable words

Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech