Abstract:Accurate and reliable human motion reconstruction is crucial for creating natural interactions of full-body avatars in Virtual Reality (VR) and entertainment applications. As the Metaverse and social applications gain popularity, users are seeking cost-effective solutions to create full-body animations that are comparable in quality to those produced by commercial motion capture systems. In order to provide affordable solutions, though, it is important to minimize the number of sensors attached to the subject's body. Unfortunately, reconstructing the full-body pose from sparse data is a heavily under-determined problem. Some studies that use IMU sensors face challenges in reconstructing the pose due to positional drift and ambiguity of the poses. In recent years, some mainstream VR systems have released 6-degree-of-freedom (6-DoF) tracking devices providing positional and rotational information. Nevertheless, most solutions for reconstructing full-body poses rely on traditional inverse kinematics (IK) solutions, which often produce non-continuous and unnatural poses. In this article, we introduce SparsePoser, a novel deep learning-based solution for reconstructing a full-body pose from a reduced set of six tracking devices. Our system incorporates a convolutional-based autoencoder that synthesizes high-quality continuous human poses by learning the human motion manifold from motion capture data. Then, we employ a learned IK component, made of multiple lightweight feed-forward neural networks, to adjust the hands and feet toward the corresponding trackers. We extensively evaluate our method on publicly available motion capture datasets and with real-time live demos. We show that our method outperforms state-of-the-art techniques using IMU sensors or 6-DoF tracking devices, and can be used for users with different body dimensions and proportions.

Deep Autoencoder for Combined Human Pose Estimation and Body Model Upscaling

3D Human Pose Estimation from Deep Multi-View 2D Pose

High-precision Human Body Acquisition Via Multi-View Binocular Stereopsis

LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats

Image-Based Synthesis for Deep 3D Human Pose Estimation

Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning

Deep3DPose: Realtime Reconstruction of Arbitrarily Posed Human Bodies from Single RGB Images

Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers

3D Human Pose and Shape Estimation with Dense Correspondence from a Single Depth Image

Neural Body Fitting: Unifying Deep Learning and Model-Based Human Pose and Shape Estimation

Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose Estimation

EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans

SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

Monocular 3D Human Pose Estimation by Predicting Depth on Joints

RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

Towards Accurate Markerless Human Shape and Pose Estimation over Time

Real-time human pose recognition in parts from single depth images

Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation