Text-driven Visual Prosody Generation for Embodied Conversational Agents

Jiali Chen,Yong Liu,Zhimeng Zhang,Changjie Fan,Yu Ding
DOI: https://doi.org/10.1145/3308532.3329445
2019-01-01
Abstract:In face-to-face conversations, head motions play a crucial role in encoding information, and humans are very skilled at decoding multiple messages from interlocutors' head motions. It is of great importance to endow embodied conversational agents (ECAs) with the capability of conveying communicative intention through head movements. Our work is aimed at automatically synthesizing head motions for an ECA speaking Chinese. We propose to take only transcripts as input to compute head movements, based on a statistical framework. Subjective experiments are conducted to validate the proposed statistical framework. The results show that the generated head animation is able to improve human perception in terms of naturalness and demonstrate that the head animation is synchronized with the input of synthetic speech.
What problem does this paper attempt to address?