Mandarin Pronunciation Modeling Based on CASS Corpus.

Zheng Fang,Song Zhanjiang,Fung Pascale,William Byrne
DOI: https://doi.org/10.1007/bf02947304
IF: 1.871
2002-01-01
Journal of Computer Science and Technology
Abstract:The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems. In this paper, the factors that may affect the recognition performance are analyzed, including those specific to the Chinese language. By studying the INITIAL/FINAL (IF) characteristics of Chinese language and developing the Bayesian equation, the concepts of generalized INITIAL/FINAL (GIF) and generalized syllable (GS), the GIF modeling and the IF-GIF modeling, as well as the context-dependent pronunciation weighting, are proposed based on a well phonetically transcribed seed database. By using these methods, the Chinese syllable error rate (SER) is reduced by 6.3% and 4.2% compared with the GIF modeling and IF modeling respectively when the language model, such as syllable or word N-gram, is not used. The effectiveness of these methods is also proved when more data without the phonetic transcription are used to refine the acoustic model using the proposed iterative forced-alignment based transcribing (IFABT) method, achieving a 5.7% SER reduction.
What problem does this paper attempt to address?