SATFace: Subject Agnostic Talking Face Generation with Natural Head Movement

Shuai Yang,Kai Qiao,Shuhao Shi,Jie Yang,Dekui Ma,Guoen Hu,Bin Yan,Jian Chen
DOI: https://doi.org/10.1007/s11063-023-11272-7
IF: 2.565
2023-01-01
Neural Processing Letters
Abstract:Talking face generation is widely used in education, entertainment, shopping, and other social practices. Existing methods focus on matching the speaker’s mouth shape with the speech content. Still, there is a lack of research on automatically extracting potential head motion features from speech, resulting in a lack of naturalness. This paper proposes SATFace, a subject agnostic talking face generation method with natural head movement. To model the talking face’s complicated and critical features (identity, background, mouth shape, head posture, etc.), we construct SATFace by taking encoder-decoder as the primary network architecture. Then, we design a long short-time feature learning network to better reference the global and local information in audio for generating reasonable head movement. Besides, a modular training process is proposed to improve explicit and implicit features’ learning effects and efficiency. The experimental comparison results show that SATFace improves by at least about 9.8% in cumulative probability of blur detection and 8.2% in synchronization confidence compared with the mainstream methods. The mean opinion scores show that SATFace has advantages in terms of lip sync quality, head movement naturalness, and video realness.
What problem does this paper attempt to address?