Abstract:The choice of acoustic modeling units is critical to acoustic modeling in large vocabulary continuous speech recognition (LVCSR) tasks. The recent connectionist temporal classification (CTC) based acoustic models have more options for the choice of modeling units. In this work, we propose a DFSMN-CTC-sMBR acoustic model and investigate various modeling units for Mandarin speech recognition. In addition to the commonly used context-independent Initial/Finals (CI-IF), context-dependent Initial/Finals (CD-IF) and Syllable, we also propose a hybrid Character-Syllable modeling units by mixing high frequency Chinese characters and syllables. Experimental results show that DFSMN-CTC-sMBR models with all these types of modeling units can significantly outperform the well-trained conventional hybrid models. Moreover, we find that the proposed hybrid Character-Syllable modeling units is the best choice for CTC based acoustic modeling for Mandarin speech recognition in our work since it can dramatically reduce substitution errors in recognition results. In a 20,000 hours Mandarin speech recognition task, the DFSMN-CTC-sMBR system with hybrid Character-Syllable achieves a character error rate (CER) of 7.45% while performance of the well-trained DFSMN-CE-sMBR system is 9.49%.

Modeling Pronunciation Variation Using Context-Dependent Weighting and B/s Refined Acoustic Modeling.

MANDARIN PRONUNCIATION VARIATION MODELING 1

Mandarin Pronunciation Modeling Based on CASS Corpus.

Improved context-dependent acoustic modeling for continuous Chinese speech recognition

Context Dependent Initial/final Acoustic Modeling for Continuous Chinese Speech Recognition

Context Dependent Syllable Acoustic Model For Continuous Chinese Speech Recognition

STUDY ON FRAMEWORK FOR CHINESE PRONUNCIATION VARIATION MODELING

Initial/final acoustic model based on separating nasal coda in Chinese Putonghua speech recognition

Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

Pronunciation Variation Modeling For Mandarin With Accent

Deep neural networks for syllable based acoustic modeling in Chinese speech recognition.

Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling

Investigation of Modeling Units for Mandarin Speech Recognition Using Dfsmn-ctc-smbr

Automatic Initial/Final Generation For Dialectal Chinese Speech Recognition

A New Acoustic Modeling of Inter-Syllable Context-Dependent Units for Putonghua Continuous Speech Recognition

INTRA-SYLLABLE DEPENDENT PHONETIC MODELING FOR CHINESE SPEECH RECOGNITION

An Innovative Prosody Modeling Method for Chinese Speech Recognition

Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis

Pronunciation scoring model for Mandarin Phonemes based on feature comparison using a simulated annealing genetic algorithm

Acoustic Modeling Based On Chinese Phonetics Knowledge

Robust F0 Modeling for Mandarin Speech Recognition in Noise.