A fast human action recognition network based on spatio-temporal features
Jie Xu,Rui Song,Haoliang Wei,Jinhong Guo,Yifei Zhou,Xiwei Huang
DOI: https://doi.org/10.1016/j.neucom.2020.04.150
IF: 6
2021-06-01
Neurocomputing
Abstract:<p>Artificial intelligence models are widely used in the field of human activity recognition, and human action recognition is an important aspect of human activity recognition. The core of human action recognition is to understand the temporal relationship between video frames. Almost all state-of-the-art methods of human action recognition in videos use optical flow. However, traditional local optical flow estimation methods are computationally expensive and not trained end-to-end. In this paper, we propose a fast network for human action recognition. Our purpose is to improve the efficiency of optical flow feature extraction and explore the fusion method of spatio-temporal features. For spatio-temporal features, our method combines spatial features and temporal features into fusion features. In addition, we propose CNN with OFF instead of the VGG16 network, which is used to process optical flow features to obtain abundant features. Our model only needs RGB inputs to get the state-of-the-art accuracy of 91.5% on UCF-101, 67.9% on HMDB51, 83.3% on MSR Daily Activity3D, and 91.25% on Florence 3D action, respectively. Compared with most state-of-the-art video action recognition models, our proposed model can effectively improve the accuracy of human action recognition.</p>
computer science, artificial intelligence