Abstract:Background Model-based 3D pose estimation has been widely used in many 3D human motion analysis applications, in which vision-based and inertial-based are two distinct lines. Multi-view images in a vision-based markerless capture system provide essential data for motion analysis, but erroneous estimates still occur due to ambiguities, occlusion, or noise in images. Besides, the multi-view setting is hard for the application in the wild. Although inertial measurement units (IMUs) can obtain accurate direction without occlusion, they are usually susceptible to magnetic field interference and drifts. Hybrid motion capture has drawn the attention of researchers in recent years. Existing 3D pose estimation methods jointly optimize the parameters of the 3D pose by minimizing the discrepancy between the image and IMU data. However, these hybrid methods still suffer from the issues such as complex peripheral devices, sensitivity to initialization, and slow convergence. Methods This article presents an approach to improve 3D human pose estimation by fusing a single image with sparse inertial measurement units (IMUs). Based on a dual-stream feature extract network, we design a model-attention network with a residual module to closely couple the dual-modal feature from a static image and sparse inertial measurement units. The final 3D pose and shape parameters are directly obtained by a regression strategy. Results Extensive experiments are conducted on two benchmark datasets for 3D human pose estimation. Compared to state-of-the-art methods, the per vertex error (PVE) of human mesh reduces by 9.4 mm on Total Capture dataset and the mean per joint position error (MPJPE) reduces by 7.8 mm on the Human3.6M dataset. The quantitative comparison demonstrates that the proposed method could effectively fuse sparse IMU data and images and improve pose accuracy.

Instance-level 3D shape retrieval from a single image by hybrid-representation-assisted joint embedding

Feature Representation for 3D Object Retrieval Based on Unconstrained Multi-View

Research and Realization on 3D Model Retrieval Based on Hybrid Shape Feature

Sketch-Based 3D Model Retrieval via Multi-feature Fusion

A Unified Feature Representation and Learning Framework for 3D Shape

Unify 3D Shape Retrieval and Classification in One Framework

A Metric Learning Method for Image-based 3D Shape Retrieval

Multi-View 3d Object Retrieval with Deep Embedding Network

3D Shape Retrieval Based on Laplace Operator and Joint Bayesian Model

Learning Discriminative and Generative Shape Embeddings for Three-Dimensional Shape Retrieval

Hybrid3D: learning 3D hybrid features with point clouds and multi-view images for point cloud registration

Non-rigid 3D Shape Retrieval Using Multidimensional Scaling and Bag-of-Features

Image-text matching using multi-subspace joint representation

Learning Robust Point Representation for 3D Non-Rigid Shape Retrieval

Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval

Im2Struct: Recovering 3D Shape Structure from a Single RGB Image

Visual Similarity Based 3D Shape Retrieval Using Bag-of-Features

Reconstructing 3D human pose and shape from a single image and sparse IMUs

Deep Single-View 3D Object Reconstruction with Visual Hull Embedding

Single Image 3D Shape Retrieval Via Cross-Modal Instance and Category Contrastive Learning

Deep Sketch-Shape Hashing With Segmented 3D Stochastic Viewing