Abstract:A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use context-independent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multi-pronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10–18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.

Improved context-dependent acoustic modeling for continuous Chinese speech recognition

Context Dependent Initial/final Acoustic Modeling for Continuous Chinese Speech Recognition

Context Dependent Syllable Acoustic Model For Continuous Chinese Speech Recognition

Initial/final acoustic model based on separating nasal coda in Chinese Putonghua speech recognition

Research on Inter-Syllable Context-Dependent Acoustic Unit for Mandarin Continuous Speech Recognition.

Acoustic Modeling Based On Chinese Phonetics Knowledge

Investigation of Modeling Units for Mandarin Speech Recognition Using Dfsmn-ctc-smbr

Research on Context-Dependent Acoustical Unit (Triphone) for Mandarin Continuous Speech Recognition

Deep neural networks for syllable based acoustic modeling in Chinese speech recognition.

Modeling Pronunciation Variation Using Context-Dependent Weighting and B/s Refined Acoustic Modeling.

A New Acoustic Modeling of Inter-Syllable Context-Dependent Units for Putonghua Continuous Speech Recognition

The Definition and Extension of the Question Set for Decision Tree Based State Tying in Chinese Speech Recognition

A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition

INTRA-SYLLABLE DEPENDENT PHONETIC MODELING FOR CHINESE SPEECH RECOGNITION

English Alphabet Recognition Based on Chinese Acoustic Modeling

Automatic Initial/Final Generation For Dialectal Chinese Speech Recognition

Algorithm for Mandarin Continuous Speech Recognition Based on Context-Dependent Unit Between Syllables

A Dialectal Chinese Speech Recognition Framework

Acoustic Modeling With Dfsmn-Ctc And Joint Ctc-Ce Learning

A two-layer lexical tree based beam search in continuous Chinese speech recognition

A comparable study of modeling units for end-to-end Mandarin speech recognition