3D human pose estimation with multi-hypotheses gated transformer

Xiena Dong,Jian Zhang,Jun Yu,Ting Yu
DOI: https://doi.org/10.1007/s00530-024-01460-3
IF: 3.9
2024-10-08
Multimedia Systems
Abstract:Human pose estimation aims to locate human joints from inputs such as images and videos. Recent works have made significant progress in 3D human pose estimation, but they still face the ill-posed problem caused by the deep ambiguity of estimating the 3D pose from 2D key points in the monocular video. This work proposes a novel Multi-Hypotheses Gated Transformer Network for 3D human pose estimation to alleviate the problem. The method generates multiple hypotheses by constructing multiple branches based on the Transformer network and then integrates hypotheses through the gating module. Among them, the Double-Gating Module is proposed to integrate two hypotheses, and we extend it as a general module that can integrate more than two hypotheses. The proposed approach is evaluated on the Human3.6M dataset, and the experimental results show that our approach outperforms the state-of-the-art methods.
computer science, information systems, theory & methods
What problem does this paper attempt to address?