Abstract:Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time‐consuming and limited by datasets, which does not satisfy immersive and real‐time requirements of VR systems. In this paper, we aim to generate 3D real‐time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self‐attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre‐trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real‐time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose‐driven avatars on Helmet‐Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state‐of‐the‐art trade‐off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose‐driven 3D human avatars generated by our method are smooth and attractive. Summary Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time‐consuming and limited by datasets, which does not satisfy immersive and real‐time requirements of VR systems. In this paper, we aim to generate 3D real‐time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self‐attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre‐trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real‐time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose‐driven avatars on Helmet‐Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state‐of‐the‐art trade‐off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose‐driven 3D human avatars generated by our method are smooth and attractive.

SADNet: Generating immersive virtual reality avatars by real‐time monocular pose estimation

RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control

Fast Registration of Photorealistic Avatars for VR Facial Animation

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

HybridAvatar: Efficient Mesh-based Human Avatar Generation from Few-Shot Monocular Images with Implicit Mesh Displacement

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

High-Fidelity 3D Head Avatars Reconstruction through Spatially-Varying Expression Conditioned Neural Radiance Field

HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations

CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning

AvatarReX: Real-time Expressive Full-body Avatars

MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

Efficient Neural Implicit Representation for 3D Human Reconstruction

HVTR++: Image and Pose Driven Human Avatars Using Hybrid Volumetric-Textural Rendering.

Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar

Universal Facial Encoding of Codec Avatars from VR Headsets

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos

NECA: Neural Customizable Human Avatar