Abstract:State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions, including voxel-to-voxel predictions, point-to-point regression, and pixel-wise estimations. Despite the good performance, those methods have a few issues in nature, such as the poor trade-off between accuracy and efficiency, and plain feature representation learning with local convolutions. In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues. The key ideas are two-fold: (a) explicitly modeling the dependencies among joints and the relations between the pixels and the joints for better local feature representation learning; (b) unifying the dense pixel-wise offset predictions and direct joint regression for end-to-end training. Specifically, we first propose a graph convolutional network (GCN) based joint graph reasoning module to model the complex dependencies among joints and augment the representation capability of each pixel. Then we densely estimate all pixels’ offsets to joints in both image plane and depth space and calculate the joints’ positions by a weighted average over all pixels’ predictions, totally discarding the complex post-processing operations. The proposed model is implemented with an efficient 2D fully convolutional network (FCN) backbone and has only about 1.4M parameters. Extensive experiments on multiple 3D hand pose estimation benchmarks demonstrate that the proposed method achieves new state-of-the-art accuracy while running very efficiently with around a speed of 110 fps on a single NVIDIA 1080Ti GPU (This work was supported in part by the National Natural Science Foundation of China under Grants 61976095, in part by the Science and Technology Planning Project of Guangdong Province under Grant 2018B030323026. This work was also partially supported by the Academy of Finland.). The code is available at https://github.com/fanglinpu/JGR-P2O.

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks

Semi-Dynamic Hypergraph Neural Network for 3D Pose Estimation

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution.

Graph-Based CNNs With Self-Supervised Module for 3D Hand Pose Estimation From Monocular RGB

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

Semantic Graph Convolutional Networks for 3D Human Pose Regression

Coarse-to-fine cascaded 3D hand reconstruction based on SSGC and MHSA

Coarse-to-Fine Hand-Object Pose Estimation with Interaction-Aware Graph Convolutional Network

High-order local connection network for 3D human pose estimation based on GCN

A residual semantic graph convolutional network with high-resolution representation for 3D human pose estimation in a virtual fashion show

Optimizing Network Structure for 3D Human Pose Estimation.

3D hand pose and mesh estimation via a generic Topology-aware Transformer model

JGR-P2O: Joint Graph Reasoning Based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

Locally Connected Network for Monocular 3D Human Pose Estimation

3D Hand Pose Estimation via Regularized Graph Representation Learning

Relation-balanced graph convolutional network for 3D human pose estimation

SDFPoseGraphNet: Spatial Deep Feature Pose Graph Network for 2D Hand Pose Estimation

3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

Hand3D: Hand Pose Estimation using 3D Neural Network

3D Hand Pose Estimation in the Wild via Graph Refinement under Adversarial Learning