Abstract:Talking head generation based on the neural radiation fields model has shown promising visual effects. However, the slow rendering speed of NeRF seriously limits its application, due to the burdensome calculation process over hundreds of sampled points to synthesize one pixel. In this work, a novel Neural Light Dynamic Fields model is proposed aiming to achieve generating high quality 3D talking face with significant speedup. The NLDF represents light fields based on light segments, and a deep network is used to learn the entire light beam's information at once. In learning the knowledge distillation is applied and the NeRF based synthesized result is used to guide the correct coloration of light segments in NLDF. Furthermore, a novel active pool training strategy is proposed to focus on high frequency movements, particularly on the speaker mouth and eyebrows. The propose method effectively represents the facial light dynamics in 3D talking video generation, and it achieves approximately 30 times faster speed compared to state of the art NeRF based method, with comparable generation visual quality.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is **to improve the speed of 3D Talking Head generation based on Neural Radiance Field (NeRF) while maintaining high - quality visual effects**. Specifically: 1. **The problem of slow rendering speed**: When generating 3D Talking Heads, traditional NeRF models need to synthesize the radiation information of hundreds of sampling points for each pixel, resulting in extremely slow rendering speed and being difficult to be widely used in practical applications. For example, it takes about 7 hours to generate a 512×512 - resolution, 30 - second video. 2. **Facial detail capture in dynamic scenes**: In the Talking Head generation task, especially the capture of high - frequency motion areas such as the mouth and eyebrows is crucial, but existing methods are insufficient in this regard. To solve these problems, the paper proposes the **Neural Light Dynamic Fields (NLDF)** model. NLDF achieves significant acceleration and quality improvement in the following ways: - **Light - segment - based representation**: NLDF decomposes light rays into multiple light segments and learns the information of the entire light beam at once through a deep network, thus avoiding the complex calculations of point - by - point synthesis in traditional NeRF. - **Knowledge distillation**: Use a pre - trained NeRF model as a teacher network to guide the NLDF model to learn the color distribution of light segments and ensure the quality of the generated results. - **Active pool training strategy**: Propose a new training strategy that focuses on high - frequency motion areas (such as the mouth and eyebrows) to improve the generation accuracy of these areas. - **Deep network design**: Adopt an 88 - layer ResMLP network, which enhances the ability to learn complex light beam information, especially in capturing subtle movements such as blinking. Through these innovations, the NLDF model not only achieves a rendering speed about 30 times faster than existing NeRF methods but also maintains high - quality visual effects.

NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

Real-time Neural Radiance Talking Portrait Synthesis Via Audio-spatial Decomposition

SD-NeRF: Towards Lifelike Talking Head Animation Via Spatially-Adaptive Dual-Driven NeRFs

S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis

Audio-driven Talking Face Video Generation with Natural Head Pose

NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior

AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis

ER-NeRF++: Efficient region-aware Neural Radiance Fields for high-fidelity talking portrait synthesis

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis

Embedded Representation Learning Network for Animating Styled Video Portrait

PortraitNeRF: A Single Neural Radiance Field for Complete and Coordinated Talking Portrait Generation

DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior

4D Facial Avatar Reconstruction From Monocular Video via Efficient and Controllable Neural Radiance Fields

Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions

MA-NeRF: Motion-Assisted Neural Radiance Fields for Face Synthesis from Sparse Images