Abstract:Hand gesture recognition is a challenging topic in the field of computer vision. Multimodal hand gesture recognition based on RGB-D is with higher accuracy than that of only RGB or depth. It is not difficult to conclude that the gain originates from the complementary information existing in the two modalities. However, in reality, multimodal data are not always easy to acquire simultaneously, while unimodal RGB or depth hand gesture data are more general. Therefore, one hand gesture system is expected, in which only unimordal RGB or Depth data is supported for testing, while multimodal RGB-D data is available for training so as to attain the complementary information. Fortunately, a kind of method via multimodal training and unimodal testing has been proposed. However, unimodal feature representation and cross-modality transfer still need to be further improved. To this end, this paper proposes a new 3D-Ghost and Spatial Attention Inflated 3D ConvNet (3DGSAI) to extract high-quality features for each modality. The baseline of 3DGSAI network is Inflated 3D ConvNet (I3D), and two main improvements are proposed. One is 3D-Ghost module, and the other is the spatial attention mechanism. The 3D-Ghost module can extract richer features for hand gesture representation, and the spatial attention mechanism makes the network pay more attention to hand region. This paper also proposes an adaptive parameter for positive knowledge transfer, which ensures that the transfer always occurs from the strong modality network to the weak one. Extensive experiments on SKIG, VIVA, and NVGesture datasets demonstrate that our method is competitive with the state of the art. Especially, the performance of our method reaches 97.87% on the SKIG dataset using only RGB, which is the current best result.

Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM

Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition

Multimode Gesture Recognition Algorithm Based on Convolutional Long Short-Term Memory Network

Large-Scale Multimodal Gesture Recognition Using Heterogeneous Networks

Multimodal Gesture Recognition Based on the ResC3D Network.

Large-scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks

Continuous Gesture Segmentation and Recognition Using 3DCNN and Convolutional LSTM

Multi-modal learning for gesture recognition

Multimodal Gesture Recognition Based On Choquet Integral

Gesture Recognition with a 3-D Accelerometer

Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition

Attentive 3D-Ghost Module for Dynamic Hand Gesture Recognition with Positive Knowledge Transfer

Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture Recognition

Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules

Multimodal Gesture Recognition for Mascot Robot System Based on Choquet Integral Using Camera and 3D Accelerometers Fusion

Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network

Gesture recognition based on deep deformable 3D convolutional neural networks

Multimodal Gesture Recognition Based on Attention Slow-Fast Fusion Networks

A ConvNet Structure Learning Spatiotemporal Features for Gesture Recognition

Attention in Convolutional LSTM for Gesture Recognition.

Modality-convolutions: Multi-modal gesture recognition based on convolutional neural network