Emotional Semantic Neural Radiance Fields for Audio-Driven Talking Head.

Haodong Lin,Zhonghao Wu,Zhenyu Zhang,Chao Ma,Xiaokang Yang
DOI: https://doi.org/10.1007/978-3-031-20500-2_44
2022-01-01
Abstract:Generating audio-driven talking head videos is a challenging problem which receives considerable attention recently. However, the emotional expressions of the speaker are often ignored, although the emotion information is expressed in the audio signal. In this paper, we propose Emotional Semantic Neural Radiance Fields (ES-NeRF), an audio-driven method for generating high-quality and emotional talking head videos based on neural radiance fields. Our method extracts the content features and the emotion features of the audio as additional inputs to construct a dynamic neural radiance field, applies the semantic segmentation map to constrain the speaker's expression, generates a dynamic three-dimensional emotional facial semantic representation, and then synthesizes the final high-quality video through the semantic translation network. Experiments show that our method can achieve high-quality results with corresponding expressions for audios containing different emotions that surpass the quality of state-of-the-art talking head methods.
What problem does this paper attempt to address?