FENet: An Efficient Feature Excitation Network for Video-based Human Action Recognition

Zhan Zhang,Yi Jin,Songhe Feng,Yidong Li,Tao Wang,Hui Tian
DOI: https://doi.org/10.1109/ICSP56322.2022.9965349
2022-01-01
Abstract:Human action recognition is a challenging task due to the need for modeling complex temporal and motion information. The mainstream methods to solve this problem are mostly based on 3D CNNs. However, 3D CNNs are computationally intensive and challenging to deploy on resource-limited devices. Besides, most methods rarely consider the redundancy in feature maps, resulting in more storage requirements. To tackle this issue, we propose a feature excitation network (FENet) based on 2D CNN framework. Specifically, we first establish a 2D backbone focuses on reducing spatial and channel redundancy in feature maps. Then we design a feature excitation model (FEM) to motivate features from multiple perspectives by using attention mechanism. Afterward, we do a series of experiments to find the optimal combination of 2D CNN and FEM. Extensive experiments demonstrate the effectiveness of our proposed method on the action recognition task.
What problem does this paper attempt to address?