Context Dependent Initial/final Acoustic Modeling for Continuous Chinese Speech Recognition

李净,郑方,张继勇,吴文虎
DOI: https://doi.org/10.3321/j.issn:1000-0054.2004.01.016
2000-01-01
Abstract:Acoustic modeling is very important for continuous Chinese speech recognition. The extended Initial/Final (XIF) set chosen as the basic speech recognition unit set to analyze the Chinese language characteristics outperformed the standard IF set. Decision tree-based state tying technology was used to construct the context dependent Initial/Final acoustic model (Tri-XIF model), with an appropriate question set design based on Chinese linguistic knowledge. Methods were developed to optimize the Tri-XIF modeling, including transcription refinement, question set extension, and model size reduction. Tests show that the Tri-XIF modeling is much better than either Tri-phone modeling or syllable modeling, with the syllable error rate reduced by 24.53% relative to the Tri-phone model and 41.65% relative to syllable model. More than 20% model size reduction was obtained with little performance deterioration using the methods in the Tri-XIF model.
What problem does this paper attempt to address?