Learning with Privileged Stereo Knowledge for Monocular Absolute 3d Human Pose Estimation
Cunling Bian,Weigang Lu,Feng Wei,Song Wang
DOI: https://doi.org/10.2139/ssrn.4789240
2024-01-01
Abstract:The realm of Absolute 3D Human Pose Estimation (3DHPE) via a calibrated monocular camera has undergone thorough exploration within the domain of deep learning. However, despite this scrutiny, its precision and ability to generalize still fall short in comparison to methodologies reliant on multi-view image frameworks. In response to this challenge, we introduce a methodology that capitalizes on stereo knowledge extracted from multi-view images, but exclusively during the training phase. Our approach entails the initial creation and integration of monocular and multi-view representations of a shared scene within a unified voxel space. This space not only aligns with the inherent 3D aspects of pose estimation but also facilitates the transfer of knowledge across varying perspectives. Subsequently, we deploy a stereo knowledge distillation algorithm to amalgamate diverse forms of stereo knowledge derived from the multi-view teacher. This algorithm incorporates several innovations, including volumetric feature imitation, probability distribution mimicking, and joint correlation congruence. Additionally, we introduce a strategy to reconcile conflicting gradients, addressing the inherent contradiction between knowledge distillation and pose estimation tasks. Lastly, we present a multi-person solution to broaden the applicability of our approach to more generic scenes. Extensive experimentation on prominent benchmark datasets, including Human3.6M, CMU Panoptic, Campus, and Shelf, attests to the effectiveness of our proposed methodology.