Abstract:Recent works have shown that neural radiance fields (NeRFs) on top of parametric models have reached SOTA quality to build photorealistic head avatars from a monocular video. However, one major limitation of the NeRF-based avatars is the slow rendering speed due to the dense point sampling of NeRF, preventing them from broader utility on resource-constrained devices. We introduce LightAvatar, the first head avatar model based on neural light fields (NeLFs). LightAvatar renders an image from 3DMM parameters and a camera pose via a single network forward pass, without using mesh or volume rendering. The proposed approach, while being conceptually appealing, poses a significant challenge towards real-time efficiency and training stability. To resolve them, we introduce dedicated network designs to obtain proper representations for the NeLF model and maintain a low FLOPs budget. Meanwhile, we tap into a distillation-based training strategy that uses a pretrained avatar model as teacher to synthesize abundant pseudo data for training. A warping field network is introduced to correct the fitting error in the real data so that the model can learn better. Extensive experiments suggest that our method can achieve new SOTA image quality quantitatively or qualitatively, while being significantly faster than the counterparts, reporting 174.1 FPS (512x512 resolution) on a consumer-grade GPU (RTX3090) with no customized optimization.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to construct an efficient 3D avatar model to achieve real - time, high - quality image rendering while avoiding the problems of high computational cost and dependence on explicit geometric structures in existing methods based on Neural Radiance Field (NeRF). Specifically: 1. **Improve rendering speed**: Existing NeRF - based avatar models require multiple network forward propagations for each pixel during the rendering process, resulting in very slow rendering speed and being difficult to be applied in real - time scenarios or on resource - constrained devices. The LightAvatar proposed in this paper can complete the rendering with a single network forward propagation by introducing the Neural Light Field (NeLF), thus greatly improving the rendering speed. 2. **Do not depend on explicit geometric structures**: Many high - quality avatar models rely on explicit 3DMM geometric structures (such as mesh or volume rendering). Although this provides good controllability and stability, it performs poorly when the 3DMM geometry is overly simplified or missing. LightAvatar does not rely on explicit geometric structures at all, but directly generates images from 3DMM parameters and camera pose inputs, simplifying the model design and improving efficiency. 3. **Training stability and data synthesis**: Due to the removal of the dependence on explicit geometric structures, LightAvatar faces stability challenges during the training process, especially when using monocular video data. For this reason, the author introduces a training strategy based on knowledge distillation, using a pre - trained teacher model to generate pseudo - data and combining it with real data for joint training to improve the generalization ability of the model and the final image quality. 4. **Correct fitting errors**: In order to further improve the performance of the model on real data, the author introduces a warping field network to correct the fitting errors in the real data, thereby improving the overall image quality. In summary, this paper aims to construct an efficient and high - quality 3D avatar model through the introduction of the Neural Light Field (NeLF) and a series of optimized designs, which can provide better performance and image quality in real - time applications.

LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field

Artist-Friendly Relightable and Animatable Neural Heads

HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field

4D Facial Avatar Reconstruction From Monocular Video via Efficient and Controllable Neural Radiance Fields

MA-NeRF: Motion-Assisted Neural Radiance Fields for Face Synthesis from Sparse Images

FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

AvatarMAV: Fast 3D Head Avatar Reconstruction Using Motion-Aware Neural Voxels

URAvatar: Universal Relightable Gaussian Codec Avatars

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

Relightable and Animatable Neural Avatar from Sparse-View Video

BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

High-Fidelity 3D Head Avatars Reconstruction through Spatially-Varying Expression Conditioned Neural Radiance Field

NECA: Neural Customizable Human Avatar

AvatarReX: Real-time Expressive Full-body Avatars

Relightable and Animatable Neural Avatars from Videos

Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

DELIFFAS: Deformable Light Fields for Fast Avatar Synthesis

Efficient Meshy Neural Fields for Animatable Human Avatars

HVTR++: Image and Pose Driven Human Avatars Using Hybrid Volumetric-Textural Rendering.