Spatial-aware Stacked Regression Network for Real-Time 3D Hand Pose Estimation.

Pengfei Ren,Haifeng Sun,Weiting Huang,Jiachang Hao,Daixuan Cheng,Qi,Jingyu Wang,Jianxin Liao
DOI: https://doi.org/10.1016/j.neucom.2021.01.045
IF: 6
2021-01-01
Neurocomputing
Abstract:Making full use of the spatial information of the depth data is crucial for 3D hand pose estimation from a single depth image. In this paper, we propose a Spatial-aware Stacked Regression Network (SSRN) for fast, robust and accurate 3D hand pose estimation from a single depth image. By adopting a differentiable pose re-parameterization process, our method efficiently encodes the pose-dependent 3D spatial struc-ture of the depth data as spatial-aware representations. Taking such spatial-aware representations as inputs, the stacked regression network utilizes multi-joint spatial context and the 3D spatial relationship between the estimated pose and the depth data to predict a refined hand pose. To further improve the estimation accuracy, we adopt a spatial attention mechanism to reduce the influence of irrelevant fea-tures for pose regression. In order to improve the speed of the network, we propose a cross-stage self-distillation mechanism to distill knowledge within the network itself. Experiments on four datasets show that our proposed method achieves state-of-the-art accuracy with high running speed around 330 FPS on a single GPU and 35 FPS on a single CPU. (c) 2021 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?