DNN Based Detection of Pronunciation Erroneous Tendency in Data Sparse Condition.

Yingming Gao,Yanlu Xie,Ju Lin,Jinsong Zhang
DOI: https://doi.org/10.1109/apsipa.2016.7820820
2016-01-01
Abstract:Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance by DNN based features extracting in data sparse condition. Firstly, the probabilities of articulatory features derived from the top layer of DNN were fed into DNN-HMM. Secondly, the bottleneck features (BNF) extracted from the middle hidden layer were incorporated with original MFCC and then fed into SGMM-HMM. The experimental results showed that the new features converted from original acoustic features with DNN were more discriminative, and SGMM with BNF outperformed DNN in detecting PETs. The SGMM-HMM obtained the best detection results, achieving FRR of 5.3%, FAR of 29.6% and DA of 90%.
What problem does this paper attempt to address?