Attention with Structure Regularization for Action Recognition.

Yuhui Quan,Yixin Chen,Ruotao Xu,Hui Ji
DOI: https://doi.org/10.1016/j.cviu.2019.102794
IF: 4.886
2019-01-01
Computer Vision and Image Understanding
Abstract:Recognizing human action in video is an important task with a wide range of applications. Recently, motivated by the findings in human visual perception, there have been numerous attempts on introducing attention mechanisms to action recognition systems. However, it is empirically observed that an implementation of attention mechanism using attention mask of free form often generates ineffective distracted attention regions caused by overfitting, which limits the benefit of attention mechanisms for action recognition. By exploiting block-structured sparsity prior on attention regions, this paper proposed an ℓ2,1-norm group sparsity regularization for learning structured attention masks. Built upon such a regularized attention module, an attention-based recurrent network is developed for action recognition. The experimental results on two benchmark datasets showed that, the proposed method can noticeably improve the accuracy of attention masks, which results in performance gain in action recognition.
What problem does this paper attempt to address?