Abstract:Hand gesture recognition is a challenging topic in the field of computer vision. Multimodal hand gesture recognition based on RGB-D is with higher accuracy than that of only RGB or depth. It is not difficult to conclude that the gain originates from the complementary information existing in the two modalities. However, in reality, multimodal data are not always easy to acquire simultaneously, while unimodal RGB or depth hand gesture data are more general. Therefore, one hand gesture system is expected, in which only unimordal RGB or Depth data is supported for testing, while multimodal RGB-D data is available for training so as to attain the complementary information. Fortunately, a kind of method via multimodal training and unimodal testing has been proposed. However, unimodal feature representation and cross-modality transfer still need to be further improved. To this end, this paper proposes a new 3D-Ghost and Spatial Attention Inflated 3D ConvNet (3DGSAI) to extract high-quality features for each modality. The baseline of 3DGSAI network is Inflated 3D ConvNet (I3D), and two main improvements are proposed. One is 3D-Ghost module, and the other is the spatial attention mechanism. The 3D-Ghost module can extract richer features for hand gesture representation, and the spatial attention mechanism makes the network pay more attention to hand region. This paper also proposes an adaptive parameter for positive knowledge transfer, which ensures that the transfer always occurs from the strong modality network to the weak one. Extensive experiments on SKIG, VIVA, and NVGesture datasets demonstrate that our method is competitive with the state of the art. Especially, the performance of our method reaches 97.87% on the SKIG dataset using only RGB, which is the current best result.

A Multi-task Interaction Mechanism for 3D Hand Pose Estimation from RGB Image

CAMInterHand: Cooperative Attention for Multi-View Interactive Hand Pose and Mesh Reconstruction

Dual Regression for Efficient Hand Pose Estimation

A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image

A hybrid network for estimating 3D interacting hand pose from a single RGB image

Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image

Simultaneous 3D Hand Detection and Pose Estimation Using Single Depth Images

Instance-level 6D pose estimation based on multi-task parameter sharing for robotic grasping

Multi-task human analysis in still images: 2D/3D pose, depth map, and multi-part segmentation

A 3D Hand Attitude Estimation Method for Fixed Hand Posture Based on Dual-View RGB Images

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

Efficient Bi-manipulation using RGBD Multi-model Fusion based on Attention Mechanism

On the Utility of 3D Hand Poses for Action Recognition

MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer

Joint Hand-Object 3D Reconstruction From a Single Image With Cross-Branch Feature Fusion

Attention-Based Pose Sequence Machine for 3D Hand Pose Estimation

Attentive 3D-Ghost Module for Dynamic Hand Gesture Recognition with Positive Knowledge Transfer

Weakly Supervised Segmentation Guided Hand Pose Estimation During Interaction with Unknown Objects.

Applying 3D Human Hand Pose Estimation to Teleoperation

Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image

Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction Learning