ER-NeRF++: Efficient region-aware Neural Radiance Fields for high-fidelity talking portrait synthesis

Jiahe Li,Jiawei Zhang,Xiao Bai,Jun Zhou,Lin Gu
DOI: https://doi.org/10.1109/iccv51070.2023.00696
IF: 18.6
2024-05-09
Information Fusion
Abstract:Despite conditional Neural Radiance Fields (NeRF) achieving great success in modeling audio-driven talking portraits, the generation quality is increasingly hampered by the lack of efficient use of space information. This paper presents ER-NeRF, a novel conditional NeRF-based architecture for talking portrait synthesis, and its variant version ER-NeRF++ to concurrently achieve fast convergence, real-time rendering, and state-of-the-art performance with small model size. Inspired by the unequal contribution of spatial regions, we propose two modules in ER-NeRF to guide the talking portrait modeling: (1) A compact and expressive Tri-Plane Hash Representation to improve the accuracy of dynamic head reconstruction by pruning empty spatial regions with three planar hash encoders. (2) A Region Attention Module for the audio-visual feature fusion, including a novel cross-modal attention mechanism to connect audio features with different spatial regions explicitly for local motion priors. Additionally, to tackle the difficulty in learning large facial motions, we propose a deformable variant ER-NeRF++ by including a Deformation Grid Transformer to enable the reuse of cross-regional spatial features for large motion representation. Compared to ER-NeRF, our ER-NeRF++ framework achieves a significant improvement in facial motion quality while maintaining the ability of fast training and real-time rendering. For the torso part, a directAdaptive Pose Encoding is introduced to simplify the pose information for a better head-torso connection. Extensive experiments demonstrate that both of our proposed frameworks can efficiently render lifelike talking portrait videos with rich realistic details, performing better in image quality and audio-lip synchronization compared to previous methods.
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?