Abstract:Hand gesture recognition is a challenging topic in the field of computer vision. Multimodal hand gesture recognition based on RGB-D is with higher accuracy than that of only RGB or depth. It is not difficult to conclude that the gain originates from the complementary information existing in the two modalities. However, in reality, multimodal data are not always easy to acquire simultaneously, while unimodal RGB or depth hand gesture data are more general. Therefore, one hand gesture system is expected, in which only unimordal RGB or Depth data is supported for testing, while multimodal RGB-D data is available for training so as to attain the complementary information. Fortunately, a kind of method via multimodal training and unimodal testing has been proposed. However, unimodal feature representation and cross-modality transfer still need to be further improved. To this end, this paper proposes a new 3D-Ghost and Spatial Attention Inflated 3D ConvNet (3DGSAI) to extract high-quality features for each modality. The baseline of 3DGSAI network is Inflated 3D ConvNet (I3D), and two main improvements are proposed. One is 3D-Ghost module, and the other is the spatial attention mechanism. The 3D-Ghost module can extract richer features for hand gesture representation, and the spatial attention mechanism makes the network pay more attention to hand region. This paper also proposes an adaptive parameter for positive knowledge transfer, which ensures that the transfer always occurs from the strong modality network to the weak one. Extensive experiments on SKIG, VIVA, and NVGesture datasets demonstrate that our method is competitive with the state of the art. Especially, the performance of our method reaches 97.87% on the SKIG dataset using only RGB, which is the current best result.

Large-scale Gesture Recognition with a Fusion of RGB-D Data Based on Optical Flow and the C3D Model

Large-Scale Video-Based Gesture Recognition Using 3D CNN Model

A Spatiotemporal Attention-Based ResC3D Model for Large-Scale Gesture Recognition

Gesture Recognition Algorithm Based on Multi-Scale Feature Fusion in RGB-D Images

Multimodal Gesture Recognition Based on the ResC3D Network.

Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition

Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM

Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules

Large-scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks

Gesture Recognition with a 3-D Accelerometer

One-shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

Gesture recognition based on multilevel multimodal feature fusion.

Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition

Large-Scale Multimodal Gesture Recognition Using Heterogeneous Networks

Real-Time Hand Gesture Recognition Using RGB-D Sensor

Surface Electromyography-based Gesture Recognition by Multi-view Deep Learning.

Large-scale Isolated Gesture Recognition Using Pyramidal 3D Convolutional Networks

GL-PAM RGB-D Gesture Recognition.

Gesture Recognition Using Enhanced Depth Motion Map and Static Pose Map

Attentive 3D-Ghost Module for Dynamic Hand Gesture Recognition with Positive Knowledge Transfer