DBMHT: A Double-Branch Multi-Hypothesis Transformer for 3D Human Pose Estimation in Video

Weijie Bao,Xuezhi Xiang
DOI: https://doi.org/10.1016/j.cviu.2024.104147
2024-01-01
Abstract:Depth blur and self-occlusion in monocular images or videos present significant challenges for 3D human pose estimation. Recently, diffusion models have emerged as powerful tools for generating high-quality images from noise. Inspired by this capability, we have designed a 3D human pose estimation framework based on diffusion models, named DDBMHT. DDBMHT generates multiple plausible 3D human poses from a single 2D pose. It progressively diffuses the ground truth 3D human poses into a random distribution, using sequences of 2D human poses as conditions for a denoiser to restore the uncorrupted 3D poses. Additionally, DDBMHT employs a re-projection method that selects the appropriate joints for the final pose by comparing the difference coefficients between assumed and real joints. Extensive experiments on the Human3.6M dataset show that our method achieves state-of-art performance.
What problem does this paper attempt to address?