Action Recognition Using Visual Attention with Reinforcement Learning.

Hongyang Li,Jun Chen,Ruimin Hu,Mei Yu,Huafeng Chen,Zengmin Xu
DOI: https://doi.org/10.1007/978-3-030-05716-9_30
2019-01-01
Abstract:Human action recognition in videos is a challenging and significant task with a broad range of applications. The advantage of the visual attention mechanism is that it can effectively reduce noise interference by focusing on the relevant parts of the image and ignoring the irrelevant part. We propose a deep visual attention model with reinforcement learning for this task. We use Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) units as a learning agent. The agent interact with video and decides both where to look next frame and where to locate the most relevant region of the selected video frame. REINFORCE method is used to learn the agent’s decision policy and back-propagation method is used to train the action classifier. The experimental results demonstrate that this glimpse window can focus on important clues. Our model achieves significant performance improvement on the action recognition datasets: UCF101 and HMDB51.
What problem does this paper attempt to address?