Mining Audio/Visual Database For Speech Driven Face Animation

Yiqiang Chen,Wen Gao,Zhaoqi Wang,Jun Miao,Dalong Jiang
DOI: https://doi.org/10.1109/ICSMC.2001.972962
2001-01-01
Abstract:In this paper, we present a data-mining framework in audio-visual interaction, and apply it to speech driven lip motion facial animation system. First, an unsupervised cluster algorithm is proposed to build a set of clusters in which each has similar configurations. Then statistical visual model is constructed by specifying all the possible cluster trajectories. The audio is analyzed with regard to learned clusters of facial gesture. For every cluster, two neural networks are trained to build mapping from audio features to cluster label and velocity respectively. Given a new vocal track, the statistical visual model and neural networks are combined together to analyze control audio, resulting in a most Likely facial state sequence. The proposed method not only automatically incorporates vocal and facial dynamics such as co-articulation, but also is characterized with easy training, more robust, extensible and interpretable. Two approaches for evaluation test are also proposed. The performance of our system shows that the proposed learning algorithm is suitable, which greatly improves the realism of face animation during speech.
What problem does this paper attempt to address?