Abstract:High-fidelity digital human representations are increasingly in demand in the digital world, particularly for interactive telepresence, AR/VR, 3D graphics, and the rapidly evolving metaverse. Even though they work well in small spaces, conventional methods for reconstructing 3D human motion frequently require the use of expensive hardware and have high processing costs. This study presents HumanAvatar, an innovative approach that efficiently reconstructs precise human avatars from monocular video sources. At the core of our methodology, we integrate the pre-trained HuMoR, a model celebrated for its proficiency in human motion estimation. This is adeptly fused with the cutting-edge neural radiance field technology, Instant-NGP, and the state-of-the-art articulated model, Fast-SNARF, to enhance the reconstruction fidelity and speed. By combining these two technologies, a system is created that can render quickly and effectively while also providing estimation of human pose parameters that are unmatched in accuracy. We have enhanced our system with an advanced posture-sensitive space reduction technique, which optimally balances rendering quality with computational efficiency. In our detailed experimental analysis using both artificial and real-world monocular videos, we establish the advanced performance of our approach. HumanAvatar consistently equals or surpasses contemporary leading-edge reconstruction techniques in quality. Furthermore, it achieves these complex reconstructions in minutes, a fraction of the time typically required by existing methods. Our models achieve a training speed that is 110X faster than that of State-of-The-Art (SoTA) NeRF-based models. Our technique performs noticeably better than SoTA dynamic human NeRF methods if given an identical runtime limit. HumanAvatar can provide effective visuals after only 30 seconds of training.

Efficient Integration of Neural Representations for Dynamic Humans

MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images

Learning Neural Volumetric Representations of Dynamic Humans in Minutes.

ReN Human: Learning Relightable Neural Implicit Surfaces for Animatable Human Rendering

Learning Dynamic Textures for Neural Rendering of Human Actors

Efficient Neural Implicit Representation for 3D Human Reconstruction

Learning Compositional Radiance Fields of Dynamic Human Heads

Neural Rendering in a Room: Amodal 3D Understanding and Free-Viewpoint Rendering for the Closed Scene Composed of Pre-Captured Objects

Neural Capture of Animatable 3D Human from Monocular Video.

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

Efficient Meshy Neural Fields for Animatable Human Avatars

DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface

Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos

EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View

MonoHuman: Animatable Human Neural Field from Monocular Video

Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering

DynPoint: Dynamic Neural Point For View Synthesis

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering