Channel-wise Temporal Attention Network for Video Action Recognition.

Jianjun Lei,Yalong Jia,Bo Peng,Qingming Huang
DOI: https://doi.org/10.1109/icme.2019.00103
2019-01-01
Abstract:Recently, video action recognition receives lots of attention, and deep learning based methods have achieved promising performance. Most existing methods focus on spatiotemporal information encoding to learn video representation, which ignore the relevance among channels. In this paper, we propose a novel Channel-wise Temporal Attention Network (CTAN) to explore the fine-grained key information for action recognition. First, the channel-wise attention generation module is proposed to emphasize the fine-grained informative features in each frame. Then, the temporal information aggregation module is introduced before attention generation to exploit the interaction of different frames. Finally, a discriminative video-level representation for action recognition is generated by end-to-end training. Experimental results on two benchmarks, UCF101 and HMDB51, demonstrate the effectiveness of the proposed CTAN.
What problem does this paper attempt to address?