Tone pronunciation quality scoring of Mandarin multi-syllable words

Junbo Zhang,Hemin Wu,Yonghong Yan
DOI: https://doi.org/10.1109/ICOSP.2010.5656359
2010-01-01
Abstract:This paper discusses tone pronunciation scoring for Mandarin multi-syllable words in Computer Assisted Language Learning (CALL) System. A commonly used tone evaluation method is using GMM to model various pitch sequence. Because the pattern of pitch sequence will change a lot in the multisyllable context, tone models trained on mono-tone database will not have good performance on multi-syllable speech. Scoring accuracy drops greatly due to tonal sandhi from mono-syllable to multi-syllable words. We proposed three major methods to solve the problem. The first is to train GMM with tri-syllable F0 trace instead of mono-syllable's. The second is not only to model F0 contour's trend, but also to model F0 value, and we use normalization to make sure that F0 values reflect tones. The third is to use linear regression to simulate the F0 contour trend. Some minor improvements are also introduced. After these methods are taken, the tone recognition correct rate is improved from 41% to 82%. © 2010 IEEE.
What problem does this paper attempt to address?