Select and Focus: Action Recognition with Spatial-Temporal Attention

Wensong Chan,Zhiqiang Tian,Shuai Liu,Jing Ren,Xuguang Lan
DOI: https://doi.org/10.1007/978-3-030-27535-8_41
2019-01-01
Abstract:With the rapid development of neural networks, human action recognition has been achieved great improvement by using convolutional neural networks (CNN) or recurrent neural networks (RNN). In this paper, we propose a model based on weighted spatial-temporal attention for action recognition. This model selects the key parts in each video frame and important frames in each video sequence. Then the model focuses on analyzing these key parts and frames. Therefore, the most important tasks of our model is to find out the key parts spatially and the important frames temporally for recognizing the action. Our model is trained and tested on three datasets including UCF-11, UCF-101, and HMDB51. The experiments demonstrate that our model can achieve a satisfactory result for human action recognition.
What problem does this paper attempt to address?