NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation

Niu Guanchen
2024-06-17
Abstract:Talking head generation based on the neural radiation fields model has shown promising visual effects. However, the slow rendering speed of NeRF seriously limits its application, due to the burdensome calculation process over hundreds of sampled points to synthesize one pixel. In this work, a novel Neural Light Dynamic Fields model is proposed aiming to achieve generating high quality 3D talking face with significant speedup. The NLDF represents light fields based on light segments, and a deep network is used to learn the entire light beam's information at once. In learning the knowledge distillation is applied and the NeRF based synthesized result is used to guide the correct coloration of light segments in NLDF. Furthermore, a novel active pool training strategy is proposed to focus on high frequency movements, particularly on the speaker mouth and eyebrows. The propose method effectively represents the facial light dynamics in 3D talking video generation, and it achieves approximately 30 times faster speed compared to state of the art NeRF based method, with comparable generation visual quality.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is **to improve the speed of 3D Talking Head generation based on Neural Radiance Field (NeRF) while maintaining high - quality visual effects**. Specifically: 1. **The problem of slow rendering speed**: When generating 3D Talking Heads, traditional NeRF models need to synthesize the radiation information of hundreds of sampling points for each pixel, resulting in extremely slow rendering speed and being difficult to be widely used in practical applications. For example, it takes about 7 hours to generate a 512×512 - resolution, 30 - second video. 2. **Facial detail capture in dynamic scenes**: In the Talking Head generation task, especially the capture of high - frequency motion areas such as the mouth and eyebrows is crucial, but existing methods are insufficient in this regard. To solve these problems, the paper proposes the **Neural Light Dynamic Fields (NLDF)** model. NLDF achieves significant acceleration and quality improvement in the following ways: - **Light - segment - based representation**: NLDF decomposes light rays into multiple light segments and learns the information of the entire light beam at once through a deep network, thus avoiding the complex calculations of point - by - point synthesis in traditional NeRF. - **Knowledge distillation**: Use a pre - trained NeRF model as a teacher network to guide the NLDF model to learn the color distribution of light segments and ensure the quality of the generated results. - **Active pool training strategy**: Propose a new training strategy that focuses on high - frequency motion areas (such as the mouth and eyebrows) to improve the generation accuracy of these areas. - **Deep network design**: Adopt an 88 - layer ResMLP network, which enhances the ability to learn complex light beam information, especially in capturing subtle movements such as blinking. Through these innovations, the NLDF model not only achieves a rendering speed about 30 times faster than existing NeRF methods but also maintains high - quality visual effects.