A ConvNet Structure Learning Spatiotemporal Features for Gesture Recognition

Guishuang Fan,Shidong Jin,Fei Wang,Dan Yan
DOI: https://doi.org/10.1109/cipcv58883.2023.00017
2023-01-01
Abstract:Gesture recognition makes Human-computer interaction more intuitive and natural, while recognizing complex dynamic gestures challenging.Building a powerful and efficient recognition model is very critical. In this paper, we propose a new network structure for dynamic gesture recognition: S3D + ConvLSTM + Mobilenet. Common RGB video frames are fed into this network, and then processed in three model sections sequentially. Our proposed methodology was rigorously evaluated using two prominent large-scale gesture recognition datasets, namely the Jester and IsoGD datasets. Experimental results demonstrated that our approach achieved performance that are comparable to cutting-edge methods, which significantly reduces the training and prediction calculation overhead by nearly 70%.
What problem does this paper attempt to address?