Pronunciation Evaluation Based on a Phoneme-Dependent Posterior Probability Transformation

严可,魏思,戴礼荣,刘庆峰
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2011.09.004
2011-01-01
Abstract:The frame-normalized log posterior probability is a promising feature for pronunciation evaluation.However,this paper points out its deficiency and proves that this reflects the confusion between the acoustic model of current pronunciation and acoustic models in the probability space.A phoneme-based log posterior probability transformation method is given to deal with this problem using both linear and non-linear transformations with a closed form solution for linear transformations and a gradient descent method for non-linear sigmoid transformations.Tests on a live Putonghua database indicate its effectiveness.The cross correlation between the human and machine scores increases from 0.582 to 0.768 with posteriors calculated in the all-phoneme probability space and from 0.696 to 0.773 with posteriors calculated in a typical error pattern refined probability space.
What problem does this paper attempt to address?