Acoustic to Articulatory Mapping with Deep Neural Network

Zhiyong Wu,Kai Zhao,Xixin Wu,Xinyu Lan,Helen Meng
DOI: https://doi.org/10.1007/s11042-014-2183-z
IF: 2.577
2014-01-01
Multimedia Tools and Applications
Abstract:Synthetic talking avatar has been demonstrated to be very useful in human-computer interactions. In this paper, we discuss the problem of acoustic to articulatory mapping and explore different kinds of models to describe the mapping function. We try general linear model (GLM), Gaussian mixture model (GMM), artificial neural network (ANN) and deep neural network (DNN) for the problem. Taking the advantage of neural network that its prediction stage can be finished in a very short time (e.g. real-time), we develop a real-time speech driven talking avatar system based on DNN. The input of the system is acoustic speech and the output is articulatory movements (that are synchronized with the input speech) on a three-dimensional avatar. Several experiments are conducted to compare the performance of GLM, GMM, ANN and DNN on a well known acoustic-articulatory English speech corpus MNGU0. Experimental results demonstrate that the proposed acoustic to articulatory mapping method with DNN can achieve the best performance.
What problem does this paper attempt to address?