3D Human Pose Estimation with Single Image and Inertial Measurement Unit (IMU) Sequence

Liujun Liu,Jiewen Yang,Ye Lin,Peixuan Zhang,Lihua Zhang
DOI: https://doi.org/10.1016/j.patcog.2023.110175
IF: 8
2024-01-01
Pattern Recognition
Abstract:Three-dimensional human pose estimation plays an important role in the field of computer vision, such as in healthcare, sports, activity recognition, motion capture, and augmented reality. However, monocular image or video based methods are sensitive to occlusions, while multi-view methods usually require enormous computation resources. Currently, inertial measurement unit (IMU)-based methods have begun to overcome the occlusion problem and can potentially achieve real-time inference. Yet, they still suffer from insufficient precision and scale drift error over time. In this paper, we propose a novel, efficient framework to fuse a single image with temporal sequence from IMU sensors to estimate human poses and reconstruct human shapes. Our method achieves 46 mm Mean Per Joint Positional Error (MPJPE) on the Total Capture dataset with 30 frames time segment, and surpasses state-of-the-art pure IMU-based methods. Moreover, in comparison with other vision-based methods, the proposed method shows great advantage in reducing computing floating point operations per second (FLOPS) quota while still achieving competitive estimation precision. Our method achieves 74 FPS on an IPhone 12 for offline processing. In addition, our method can easily be generalized for outdoor cases.
What problem does this paper attempt to address?