Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks

Hui Fang,Dongdong Weng,Zeyu Tian,Zhen Song
DOI: https://doi.org/10.1109/vrw58643.2023.00145
2023-01-01
Abstract:Having a system to automatically generate a talking mouth in sync with input speech would enhance speech communication and enable many novel applications. This article presents a new model that can generate 3D talking mouth landmarks from Chinese speech. We use sparse 3D landmarks to model the mouth motion, which are easy to capture and provide sufficient lip accuracy. The 4D mouth motion dataset was collected by our self-developed facial capture device, filling the gap in the Chinese speech-driven lip dataset. The exper-imental results show that the generated talking landmarks achieve accurate, smooth, and natural 3D mouth movements.
What problem does this paper attempt to address?