Abstract:3D hand pose estimation from a monocular RGB image is a highly challenging task due to self-occlusion, diverse appearances, and inherent depth ambiguities within monocular images. Most of the previous methods first employ deep neural networks to fit 2D joint location maps, then combines them with implicit or explicit pose-aware features to directly regress 3D hand joints positions using their designed network structure. However, the skeleton positions and corresponding skeleton-aware content information located in the latent space are invariably ignored. These skeleton-aware contents effectively bridge the gap between hand joint and hand skeleton information by associating the relationship between different hand joints features and the hand skeleton positions distribution in 2D space. To address this issue, we propose a simple yet efficient deep neural network to directly recover reliable 3D hand pose from monocular RGB images with faster estimation process. Our purpose is the reduction of the model computational complexity while maintaining high precision performance. Therefore, we design a novel Feature Chat Block (FCB) to complete feature boosting, which enables the intuitively enhanced interaction between joint and skeleton features. First, this FCB module updates joint features effectively based on semantic graph convolutional neural network and multi-head self-attention mechanism. The GCN-based structure focuses on the physical hand joints included in a binary adjacency matrix and the self-attention part pays attention to hand joints located in a complementary matrix. Then, the FCB module employs query and key mechanisms respectively representing joint and skeleton features to further implement feature interaction. After a set of FCB modules, our model updates the fused features in a coarse-to-fine manner and finally outputs a predicted 3D hand pose. We conducted a comprehensive set of ablation experiments on the InterHand2.6M dataset to validate the effectiveness and significance of the proposed method. Additionally, experimental results on Rendered Hand Dataset, Stereo Hand Datasets, First-Person Hand Action Dataset and FreiHAND Dataset show our model surpasses the state-of-the-art methods with faster inference speed.

QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

Pose-Guided Hierarchical Graph Reasoning for 3-D Hand Pose Estimation from a Single Depth Image.

HMTNet:3D Hand Pose Estimation from Single Depth Image Based on Hand Morphological Topology

SegPoseNet: Segmentation-Guided 3D Hand Pose Estimation

SDFPoseGraphNet: Spatial Deep Feature Pose Graph Network for 2D Hand Pose Estimation

JGR-P2O: Joint Graph Reasoning Based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Multistage 3D Hand Pose Estimation Algorithm Based on Skeleton Points

MSMB-GCN: Multi-scale Multi-branch Fusion Graph Convolutional Networks for 3D Human Pose Estimation

Hand3D: Hand Pose Estimation using 3D Neural Network

3D Hand Pose and Shape Estimation from Monocular RGB Via Efficient 2D Cues

GHand - A Graph Convolution Network for 3D Hand Pose Estimation.

3D Hand Pose Estimation from Monocular RGB with Feature Interaction Module

3D hand pose and mesh estimation via a generic Topology-aware Transformer model

DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth

Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

Accurate 3D Hand Pose Estimation Network Utilizing Joints Information.

3D Hand Pose Estimation Algorithm Based on Cascaded Features and Graph Convolution

MH-Net: Multiheaded 3D Hand Pose Estimation Network with 3D Anchorsets and Improved Multiscale Vision Transformer

Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training