Video Realistic Mouth Animation Based on an Audio Visual DBN Model with Articulatory Features and Constrained Asynchrony

Dongmei Jiang,Peizhen Liu,Ilse Ravyse,Hichem Sahli,Werner Verhelst
DOI: https://doi.org/10.1109/icig.2009.51
2009-01-01
Abstract:This paper presents a mouth animation construction method based on the DBN models with articulatory features (AF_AVDBN), in which the articulatory features of lips, tongue, glottis/velum can be asynchronous within a maximum asynchrony constraint to describe the speech production process more reasonably. Given an audio input and the trained AF_AVDBN models, the optimal visual feature learning algorithm is deduced based on the Maximum Likelihood Estimation criterion. The learned visual features are then used to construct the mouth images for the input speech. Objective and subjective evaluations on the mouth animations of 110 speech sentences show that the learned visual features from the AF_AVDBN models track the real visual features very closely, and the constructed mouth images from the AF_AVDBN models are very much like the real ones.
What problem does this paper attempt to address?