Selective spatiotemporal features learning for dynamic gesture recognition
Xianlun Tang,Zhenfu Yan,Jiangping Peng,Bohui Hao,Huiming Wang,Jie Li
DOI: https://doi.org/10.1016/j.eswa.2020.114499
IF: 8.5
2021-05-01
Expert Systems with Applications
Abstract:<p>Gesture recognition, which aims to understand meaningful movements of human bodies, plays an essential role in human–computer interaction. The key to gesture recognition is to learn compact and effective spatiotemporal information. However, it remains a challenging task due to the barriers of gesture-irrelevant factors. A number of attempts have been taken to address this problem by cascading deep heterogeneous architectures. However, this cascading strategy cannot capture both local and global spatiotemporal features at each stage of feature learning. In this paper, we propose a novel refined fusion model architecture combining the ResC3D network and Convolutional LSTM (ConvLSTM) with a dynamic select mechanism called Selective Spatiotemporal features learning (SeST). Such a heterogeneous network system is able to simultaneously learn short-term and long-term spatiotemporal features, and they are complementary to each other. The SeST block enables the ResC3D network and ConvLSTM to adaptively adjust their contributions to classification during feature learning with soft-attention. The method has been evaluated on the three publicly available datasets: the Sheffield Kinect Gesture (SKIG) dataset, the ChaLearn LAP large scale isolated gesture dataset (IsoGD), and the EgoGesture dataset. Experiment results show that the proposed method outperforms other state-of-the-art methods. Besides, our model is an end-to-end model, which can be embedded in many intelligent systems applications.</p>
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science