Pronunciation Erroneous Tendency Detection with Combination of Convolutional Neural Network and Long Short-Term Memory

Longfei Yang,Yanlu Xie,Jinsong Zhang
2019-01-01
Abstract:Computer Assisted Pronunciation Training (CAPT) systems can automatically detect pronunciation problems in the speech by the second language learners, thus are helpful for them to do more pronunciation training. Pronunciation Erroneous Tendencies (PETs) we proposed previously consist of a set of articulation configurations regarding incorrect articulation manners and positions, and their detection could lead to a more instructive guidance than the commonly used scoring ones. Although approaches have shown that PETs could be reliably detected based on Gaussian Mixture-Hidden Markov Model (GMM-HMM) or Deep Neural Network-Hidden Markov Model (DNN-HMM), they also suggested that the proposal be seriously suffering from problems of acoustic variations and data sparsity. To alleviate the problems, we propose a series of techniques for PET detection in this paper: firstly, some features with robustness was extracted by convolutional layer to reduce spectral variation; and then Long Short-Term Memory (LSTM) model was employed for modeling PET in order to handle variations along time. Besides, data augmentation was adopted to lessen the data sparsity; and then All proposals have been experimented and the results suggested that they are effective in the PET detection task.
What problem does this paper attempt to address?