Abstract:Recent neural rendering methods have made great progress in generating photorealistic human avatars. However, these methods are generally conditioned only on low-dimensional driving signals (e.g., body poses), which are insufficient to encode the complete appearance of a clothed human. Hence they fail to generate faithful details. To address this problem, we exploit driving view images (e.g., in telepresence systems) as additional inputs. We propose a novel neural rendering pipeline, Hybrid Volumetric-Textural Rendering (HVTR++), which synthesizes 3D human avatars from arbitrary driving poses and views while staying faithful to appearance details efficiently and at high quality. First, we learn to encode the driving signals of pose and view image on a dense UV manifold of the human body surface and extract UV-aligned features, preserving the structure of a skeleton-based parametric model. To handle complicated motions (e.g., self-occlusions), we then leverage the UV-aligned features to construct a 3D volumetric representation based on a dynamic neural radiance field. While this allows us to represent 3D geometry with changing topology, volumetric rendering is computationally heavy. Hence we employ only a rough volumetric representation using a pose- and image-conditioned downsampled neural radiance field (PID-NeRF), which we can render efficiently at low resolutions. In addition, we learn 2D textural features that are fused with rendered volumetric features in image space. The key advantage of our approach is that we can then convert the fused features into a high-resolution, high-quality avatar by a fast GAN-based textural renderer. We demonstrate that hybrid rendering enables HVTR++ to handle complicated motions, render high-quality avatars under user-controlled poses/shapes, and most importantly, be efficient at inference time. Our experimental results also demonstrate state-of-the-art quantitative results.

HQ3DAvatar: High Quality Controllable 3D Head Avatar

HQ3DAvatar: High Quality Implicit 3D Head Avatar

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

HQ-Avatar: Towards High-Quality 3D Avatar Generation Via Point-based Representation

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

TimeWalker: Personalized Neural Space for Lifelong Head Avatars

Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations

OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

High-Fidelity 3D Head Avatars Reconstruction through Spatially-Varying Expression Conditioned Neural Radiance Field

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting

GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction

MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar

HVTR++: Image and Pose Driven Human Avatars Using Hybrid Volumetric-Textural Rendering.

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos