Abstract:Background Model-based 3D pose estimation has been widely used in many 3D human motion analysis applications, in which vision-based and inertial-based are two distinct lines. Multi-view images in a vision-based markerless capture system provide essential data for motion analysis, but erroneous estimates still occur due to ambiguities, occlusion, or noise in images. Besides, the multi-view setting is hard for the application in the wild. Although inertial measurement units (IMUs) can obtain accurate direction without occlusion, they are usually susceptible to magnetic field interference and drifts. Hybrid motion capture has drawn the attention of researchers in recent years. Existing 3D pose estimation methods jointly optimize the parameters of the 3D pose by minimizing the discrepancy between the image and IMU data. However, these hybrid methods still suffer from the issues such as complex peripheral devices, sensitivity to initialization, and slow convergence. Methods This article presents an approach to improve 3D human pose estimation by fusing a single image with sparse inertial measurement units (IMUs). Based on a dual-stream feature extract network, we design a model-attention network with a residual module to closely couple the dual-modal feature from a static image and sparse inertial measurement units. The final 3D pose and shape parameters are directly obtained by a regression strategy. Results Extensive experiments are conducted on two benchmark datasets for 3D human pose estimation. Compared to state-of-the-art methods, the per vertex error (PVE) of human mesh reduces by 9.4 mm on Total Capture dataset and the mean per joint position error (MPJPE) reduces by 7.8 mm on the Human3.6M dataset. The quantitative comparison demonstrates that the proposed method could effectively fuse sparse IMU data and images and improve pose accuracy.

Digital Twin for Digital Health: Body Joint Modeling and 3D Pose Reconstruction

Mpose: Environment- and Subject-Agnostic 3D Skeleton Posture Reconstruction Leveraging a Single Mmwave Device

Construction of Human Digital Twin Model Based on Multimodal Data and Its Application in Locomotion Mode Identification

Dynamic Human Body Reconstruction and Motion Tracking with Low-Cost Depth Cameras

High-precision Human Body Acquisition Via Multi-View Binocular Stereopsis

Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-body Mesh Recovery

LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies

Symmetry-aware Kinematic Skeleton Generation of a 3D Human Body Model.

Anatomically Detailed Simulation of Human Torso

3D joints estimation of human body using part segmentation

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation

Not All Parts Are Created Equal: 3D Pose Estimation by Modeling Bi-Directional Dependencies of Body Parts

A Review: Point Cloud-Based 3D Human Joints Estimation

Not All Parts Are Created Equal: 3D Pose Estimation by Modelling Bi-directional Dependencies of Body Parts

Multi-Branch High-Dimensional Guided Transformer-Based 3D Human Posture Estimation

Deep3DPose: Realtime Reconstruction of Arbitrarily Posed Human Bodies from Single RGB Images

A 3D 2-subiteration thinning algorithm for human pose estimation

Reconstructing 3D human pose and shape from a single image and sparse IMUs

Towards Realistic 3D Human Motion Prediction with A Spatio-temporal Cross-transformer Approach