Improving pronunciation erroneous tendency detection with convolutional long short-term memory

Longfei Yang,Yanlu Xie,Yingming Gao,Jinsong Zhang
DOI: https://doi.org/10.1109/IALP.2017.8300544
2017-01-01
Abstract:Corrective feedbacks are much more desirable than pure scores since they provide more information to guide learners to correct their erroneous pronunciations in the area of computer assisted pronunciation teaching (CAPT). For this purpose, we previously proposed pronunciation erroneous tendency (PET), which represents the errors from the aspects of articulation manner and constriction place. And we implemented PET detection system with Gaussian Mixture Model (GMM) and Deep Neural Networks (DNN) in previous work [1–2]. However, it is still challenging to achieve a highperformance system because of context dependency of PETs and data sparseness problem. In this paper, we first introduced data augmentation scheme to mitigate data sparseness problem. To further improve the performance, we proposed taking advantage of the LSTM and CNN by combining them into a unified system. Experimental results suggested that the proposed CNN-LSTM outperformed other models in our previous work.
What problem does this paper attempt to address?