PortraitNeRF: A Single Neural Radiance Field for Complete and Coordinated Talking Portrait Generation

Pengfei Hu,Xiuzhe Wu,Yang Wu,Wenming Yang
DOI: https://doi.org/10.1109/icme57554.2024.10688062
2024-01-01
Abstract:We present a novel framework named PortraitNeRF to generate high-fidelity talking portrait videos for performing faithful identity-preserving reenactment of source videos. This is a challenging task because the generated results should be natural and match the speaker’s head movement, expression, eye blinks and speech audio. To acquire sufficient guidance from source video, the proposed PortraitNeRF exploits not only speech audio but also detailed motion information derived from visual data, including facial expressions, head pose and head position information. By adopting only a single neural radiance field, PortraitNeRF is able to generate complete and coordinated portrait video without bells and whistles. The completeness is ensured by the single-NeRF structure, and the superior head-torso coordination ability comes from using head pose and position information as its conditional input. Moreover, a simple yet effective mouth region emphasis strategy that fits well with the NeRF mechanism helps improving the accuracy of mouth shape. Experimental results and ablation studies demonstrate the superiority and effectiveness of PortraitNeRF.
What problem does this paper attempt to address?