Recurrent Attention Network Using Spatial-Temporal Relations for Action Recognition

Mingxing Zhang,Yang,Yanli Ji,Ning Xie,Fumin Shen
DOI: https://doi.org/10.1016/j.sigpro.2017.12.008
IF: 4.729
2018-01-01
Signal Processing
Abstract:Action recognition in videos, which contains many complex and semantic contents, is still a challenging task in computer vision research. In this paper, we propose a novel attention mechanism that leverages the gate system of Long Short Term Memory (LSTM) to compute the attention weights for action recognition. The proposed attention mechanism is embedded in a recurrent attention network that can explore the spatial-temporal relations between different local regions to concentrate important ones. For more accurate attention, we derive a new attention unit from the standard LSTM unit so as how important the local region is only depends on its input gate. Because of exploring spatial-temporal relations and using attention unit, our model can attend more accurately and thus achieve a better action recognition performance. We evaluate our proposed model on three datasets: UCF101, HMDB51 and Hollywood2, and results illustrate that our model outperforms other attention models with significant improvements. (C) 2017 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?