Abstract:In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.

HMTNet:3D Hand Pose Estimation from Single Depth Image Based on Hand Morphological Topology

Accurate 3D Hand Pose Estimation Network Utilizing Joints Information.

Hand3D: Hand Pose Estimation using 3D Neural Network

Hierarchical Topology Based Hand Pose Estimation from a Single Depth Image

Depth-Based 3D Hand Pose Estimation: from Current Achievements to Future Goals

Real-Time 3D Hand Pose Estimation with 3D Convolutional Neural Networks

MH-Net: Multiheaded 3D Hand Pose Estimation Network with 3D Anchorsets and Improved Multiscale Vision Transformer

HandFormer: Hand Pose Reconstructing from a Single RGB Image

Learning Hand Latent Features for Unsupervised 3D Hand Pose Estimation

3D Hand Shape and Pose Estimation from a Single RGB Image

Hand pose estimation in depth image using CNN and random forest

NETWORKS EFFECTIVELY UTILIZING 2D SPATIAL INFORMATION FOR ACCURATE 3D HAND POSE ESTIMATION

3D hand pose and mesh estimation via a generic Topology-aware Transformer model

Cascaded hierarchical CNN for 2D hand pose estimation from a single color image

3D Hand Pose Estimation Algorithm Based on Cascaded Features and Graph Convolution

Hand Pose Estimation with Attention-and-Sequence Network.

Towards Good Practices for Deep 3D Hand Pose Estimation

DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth

Estimate Hand Poses Efficiently from Single Depth Images

Learning a Deep Predictive Coding Network for a Semi-Supervised 3D-Hand Pose Estimation

QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation