Sem-Avatar: Semantic Controlled Neural Field for High-Fidelity Audio Driven Avatar.

Xiang Zhou,Weichen Zhang,Yikang Ding,Fan Zhou,Kai Zhang
DOI: https://doi.org/10.1007/978-981-99-8432-9_6
2024-01-01
Abstract:In this paper, we tackle the audio-driven avatar challenge by fitting a semantic controlled neural field to a talking-head video. While existing methods struggle with realism and head-torso inconsistency, our novel end-to-end framework, semantic controlled neural field (Sem-Avatar) sucessfully overcomes the above problems, delievering high-fidelity avatar. Specifically, we devise a one-stage audio-driven forward deformation approach to ensure head-torso alignment. We further propose to use semantic mask as a control signal for eye opening, lifting the naturalness of the avatar to another level. We train our framework via comparing the rendered avatar to the original video. We further append a semantic loss which leverages human face prior to stabilize training. Extensive experiments on public datasets demonstrate Sem-Avatar’s superior rendering quality and lip synchronization, establishing a new state-of-the-art for audio-driven avatars.
What problem does this paper attempt to address?