Better Deep Visual Attention with Reinforcement Learning in Action Recognition.

Gang Wang,Wenmin Wang,Jingzhuo Wang,Yaohua Bu
DOI: https://doi.org/10.1109/iscas.2017.8050638
2017-01-01
Abstract:Deep visual attention in computer vision has attracted much attention over the past years, which achieves great contributions especially in image classification, image caption and action recognition. However, due to taking BP training wholly or partially, they can not show the true power of attention in computational efficiency and focusing accuracy. Our intuition is that attention mechanism should be similar to the process in which human draw attention and select the next location to focus, by observing, analyzing and jumping instead of existing describing continuous features. Based on this insight, we formulate our model as a recurrent neural network-based agent that chooses attention region by reinforcement learning at each timestep. In experiments, our model explicitly outperforms baselines not only in focusing and recognizing accuracy, but also consumes much less computational resources, which can be honored as better deep visual attention.
What problem does this paper attempt to address?