Abstract:Background Model-based 3D pose estimation has been widely used in many 3D human motion analysis applications, in which vision-based and inertial-based are two distinct lines. Multi-view images in a vision-based markerless capture system provide essential data for motion analysis, but erroneous estimates still occur due to ambiguities, occlusion, or noise in images. Besides, the multi-view setting is hard for the application in the wild. Although inertial measurement units (IMUs) can obtain accurate direction without occlusion, they are usually susceptible to magnetic field interference and drifts. Hybrid motion capture has drawn the attention of researchers in recent years. Existing 3D pose estimation methods jointly optimize the parameters of the 3D pose by minimizing the discrepancy between the image and IMU data. However, these hybrid methods still suffer from the issues such as complex peripheral devices, sensitivity to initialization, and slow convergence. Methods This article presents an approach to improve 3D human pose estimation by fusing a single image with sparse inertial measurement units (IMUs). Based on a dual-stream feature extract network, we design a model-attention network with a residual module to closely couple the dual-modal feature from a static image and sparse inertial measurement units. The final 3D pose and shape parameters are directly obtained by a regression strategy. Results Extensive experiments are conducted on two benchmark datasets for 3D human pose estimation. Compared to state-of-the-art methods, the per vertex error (PVE) of human mesh reduces by 9.4 mm on Total Capture dataset and the mean per joint position error (MPJPE) reduces by 7.8 mm on the Human3.6M dataset. The quantitative comparison demonstrates that the proposed method could effectively fuse sparse IMU data and images and improve pose accuracy.

A Hybrid Approach for Cross-modality Pose Estimation Between Image and Point Cloud

Global Pose Estimation Iterative Algorithm for Multi-camera from Point and Line Correspondences

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

Hybrid model for Single-Stage Multi-Person Pose Estimation

Self-Attention Mechanism-Based Head Pose Estimation Network with Fusion of Point Cloud and Image Features

Attention-Enhanced Cross-modal Localization Between Spherical Images and Point Clouds

Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation

A Pose Estimation Algorithm for Multimodal Data Fusion

HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation

Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation

Object Pose Estimation with Point Cloud Data for Robot Grasping

Robust Classification and 6D Pose Estimation by Sensor Dual Fusion of Image and Point Cloud Data

Reconstructing 3D human pose and shape from a single image and sparse IMUs

PoseDet: Fast Multi-Person Pose Estimation Using Pose Embedding

A multilevel object pose estimation algorithm based on point cloud keypoints

Multi-Scale Supervised Network for Human Pose Estimation

HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion

View Invariant Human Body Detection and Pose Estimation from Multiple Depth Sensors

Multi-Person Pose Estimation with Enhanced Channel-wise and Spatial Information

Joint Multi-Person Pose Estimation and Semantic Part Segmentation

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention