Abstract:<p>Dynamic gesture recognition, which plays an essential role in human-computer interaction, has been widely investigated but not yet fully addressed. The challenge mainly lies in three folders: 1) to model both of the spatial appearance and the temporal evolution simultaneously; 2) to address the interference from the varied and complex background; 3) the requirement of real-time processing. In this paper, we address the above challenges by proposing a novel deep deformable 3D convolutional neural network for end-to-end learning, which not only gains impressive accuracy in challenging datasets but also can meet the requirement of the real-time processing. We propose three types of very deep 3D CNNs for gesture recognition, which can directly model the spatiotemporal information with their inherent hierarchical structure. To eliminate the background interference, a light-weight spatiotemporal deformable convolutional module is specially designed to augment the spatiotemporal sampling locations of the 3D convolution by learning additional offsets according to the preceding feature map. It can not only diversify the shape of the convolution kernel to better fit the appearance of the hands and arms, but also help the models pay more attention to the discriminative frames in the video sequence. The proposed method is evaluated on three challenging datasets, EgoGesture, Jester and Chalearn-IsoGD, and achieves the state-of-the-art performance on all of them. Our model ranked first on Jester's official leader-board until the submission time. The code and the trained models are released for better communication and future works<a class="workspace-trigger" href="#fn0001"><sup>1</sup></a>.</p>

Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition

Continuous Gesture Segmentation and Recognition Using 3DCNN and Convolutional LSTM

Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM

Gesture recognition based on deep deformable 3D convolutional neural networks

A ConvNet Structure Learning Spatiotemporal Features for Gesture Recognition

Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules

Dynamic Spatio-Temporal Feature Learning via Graph Convolution in 3D Convolutional Networks

Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture Recognition

Large-Scale Multimodal Gesture Recognition Using Heterogeneous Networks

Dynamic Gesture Recognition Based on Feature Fusion Network and Variant ConvLSTM.

Selective spatiotemporal features learning for dynamic gesture recognition

Large-scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks

Large-scale Isolated Gesture Recognition Using Pyramidal 3D Convolutional Networks

Dynamic Hand Gesture Recognition Using Multi-direction 3D Convolutional Neural Networks

Recognition of Social Touch Gestures Using 3d Convolutional Neural Networks

Temporal Convolutional Neural Network for Gesture Recognition.

Automatic 3D Skeleton-based Dynamic Hand Gesture Recognition Using Multi-Layer Convolutional LSTM.

Effective Fusion of 3DCNN and Convolutional GRU for Gesture Recognition

Spatial-temporal Dynamic Hand Gesture Recognition Via Hybrid Deep Learning Model

High-Density Surface EMG-Based Gesture Recognition Using a 3D Convolutional Neural Network