A Multimodal Approach of Generating 3D Human-Like Talking Agent.

Minghao Yang,Jianhua Tao,Kaihui Mu,Ya Li,Jianfeng Che
DOI: https://doi.org/10.1007/s12193-011-0073-5
2011-01-01
Journal on Multimodal User Interfaces
Abstract:This paper introduces a multimodal framework of generating a 3D human-like talking agent which can communicate with user through speech, lip movement, head motion, facial expression and body animation. In this framework, lip movements are obtained by searching and matching acoustic features which are represented by Mel-frequency cepstral coefficients (MFCC) in audio-visual bimodal database. Head motion is synthesized by visual prosody which maps textual prosodic features into rotational and translational parameters. Facial expression and body animation are generated by transferring motion data to skeleton. A simplified high level Multimodal Marker Language (MML), in which only a few fields are used to coordinate the agent channels, is introduced to drive the agent. The experiments validate the effectiveness of the proposed multimodal framework.
What problem does this paper attempt to address?