Unsupervised Universal Hierarchical Multi-Person 3D Pose Estimation for Natural Scenes

Gu Renshu,Jiang Zhongyu,Wang Gaoang,McQuade Kevin,Hwang Jenq-Neng
DOI: https://doi.org/10.1007/s11042-022-13079-5
IF: 2.577
2022-01-01
Multimedia Tools and Applications
Abstract:Multi-person 3D pose estimation using a monocular freely moving camera in real-world scenarios remains a challenge. There is a lack of data with 3D ground truth, and real-world scenes usually contain self-occlusions and inter-person occlusions. To address these challenges, an unsupervised Universal Hierarchical 3D Human Pose Estimation (UH3DHPE) method that optimizes the torso and limb poses based on a hierarchical framework is proposed. To handle the case of an occluded or inaccurate 2D torso keypoints, which play an important role for 3D pose initialization and subsequent inference, an effective method to directly estimate limb poses without building upon the estimated torso pose is proposed, and the torso pose can then be further refined to form the hierarchy in a bottom-up fashion. An adaptive merging strategy is proposed to determine the best hierarchy. To verify the effectiveness of the proposed scheme, a video dataset for multi-person interactions is collected by a moving camera, under a Motion Capture (MoCap) ground truth data acquisition environment, for our performance evaluations. Experimental results show the proposed method outperforms state-of-the-art methods on the multi-person moving camera scenarios.
What problem does this paper attempt to address?