Mapping between ultrasound and vowel speech using DNN framework

Xinyuan Zheng,Jianguo Wei,Wenhuan Lu,Qiang Fang,Jianwu Dang
DOI: https://doi.org/10.1109/ISCSLP.2014.6936700
2014-01-01
Abstract:Building up the mapping between articulatory movements and corresponding speech could great facility the speech training and speech aid for voiceless patients. In this paper, we propose a deep learning framework for building up a mapping between articulatory information and corresponding speech, which were recorded by ultrasound system. The dataset includes six Chinese vowels. We use Bimodal Deep Autoencoder algorithm based on RBM to learn the relationship between speech and articulation, the weights matrix of representation of them. Speech and ultrasound images have been reconstructed using the extracted features. The reconstruction error of articulation by our method is less than that of PCA based approach. The reconstructed speech is similar to the original one. We propose a mapping from ultrasound tongue image to acoustic signal with a revised Denoising Autoencoder, the results show that it is a promising approach. In contrast, another experiment is conducted to synthesize the ultrasound tongue image from the speech, but the result should be improved.
What problem does this paper attempt to address?