Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image

Zheng Chen,Yi Sun
DOI: https://doi.org/10.1007/s10489-022-03764-1
IF: 5.3
2022-07-08
Applied Intelligence
Abstract:For monocular RGB based 3D hand pose estimation task, z coordinates are more difficult to estimate compared to the 2D hand joint coordinates due to the intrinsic depth ambiguity, thus some works firstly estimate the 2D hand joint coordinates and then apply a 2D to 3D lifting module to estimate the z coordinates. In this paper, we propose a new 2D to 3D lifting module. Differ from existing methods which estimate z coordinates of all hand joints simultaneously, we propose to estimate the z coordinate of each hand joint individually with its 2D joint features and the global image features as input. It can divide the complex task into simple sub-tasks, which makes it easier to lift the 2D coordinates to 3D. Besides, our 2D to 3D lifting module use only convolutional operation with shared convolutional kernel, which has fewer network parameters compared with existing methods usually with fully connected layers. Furthermore, we introduce a new inter joint attention module in our model to learn the correlation between every two hand joints. We conduct experiments on two popular hand pose datasets. From the experimental results we can see, our model gets state-of-the-art performance compared with existing methods. Ablation study also verifies the validity of each components proposed in our model.
computer science, artificial intelligence
What problem does this paper attempt to address?