Abstract:High-fidelity digital human representations are increasingly in demand in the digital world, particularly for interactive telepresence, AR/VR, 3D graphics, and the rapidly evolving metaverse. Even though they work well in small spaces, conventional methods for reconstructing 3D human motion frequently require the use of expensive hardware and have high processing costs. This study presents HumanAvatar, an innovative approach that efficiently reconstructs precise human avatars from monocular video sources. At the core of our methodology, we integrate the pre-trained HuMoR, a model celebrated for its proficiency in human motion estimation. This is adeptly fused with the cutting-edge neural radiance field technology, Instant-NGP, and the state-of-the-art articulated model, Fast-SNARF, to enhance the reconstruction fidelity and speed. By combining these two technologies, a system is created that can render quickly and effectively while also providing estimation of human pose parameters that are unmatched in accuracy. We have enhanced our system with an advanced posture-sensitive space reduction technique, which optimally balances rendering quality with computational efficiency. In our detailed experimental analysis using both artificial and real-world monocular videos, we establish the advanced performance of our approach. HumanAvatar consistently equals or surpasses contemporary leading-edge reconstruction techniques in quality. Furthermore, it achieves these complex reconstructions in minutes, a fraction of the time typically required by existing methods. Our models achieve a training speed that is 110X faster than that of State-of-The-Art (SoTA) NeRF-based models. Our technique performs noticeably better than SoTA dynamic human NeRF methods if given an identical runtime limit. HumanAvatar can provide effective visuals after only 30 seconds of training.

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

In-Hand 3D Object Reconstruction from a Monocular RGB Video

RobustFusion: Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

HDhuman: High-quality Human Performance Capture with Sparse Views

ReN Human: Learning Relightable Neural Implicit Surfaces for Animatable Human Rendering

SAILOR: Synergizing Radiance and Occupancy Fields for Live Human Performance Capture

NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction

HDhuman: High-quality Human Novel-view Rendering from Sparse Views

RobustFusion: Human Volumetric Capture with Data-Driven Visual Cues Using a RGBD Camera

Neural Rendering in a Room: Amodal 3D Understanding and Free-Viewpoint Rendering for the Closed Scene Composed of Pre-Captured Objects

Efficient Neural Implicit Representation for 3D Human Reconstruction

R2Human: Real-Time 3D Human Appearance Rendering from a Single Image

UV Volumes for Real-time Rendering of Editable Free-view Human Performance

HumanRecon: Neural Reconstruction of Dynamic Human Using Geometric Cues and Physical Priors.

Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras

Monocular Real-Time Human Geometry Reconstruction

Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering

Rendering Humans from Object-Occluded Monocular Videos

Relightable and Animatable Neural Avatar from Sparse-View Video

Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings