Collaborative Positional-Motion Excitation Module for Efficient Action Recognition.

Tamam Alsarhan,Hongtao Lu
DOI: https://doi.org/10.1007/978-3-030-89370-5_11
2021-01-01
Abstract:Massive progress for vision-based action recognition has been made in the last few years, owing to the advancement of deep convolutional neural networks (CNNs). In contrast with 2D CNN-based approaches, 3D CNN-based approaches can effectively capture spatial and temporal features. However, they are computationally intensive. To boost 2D-CNN performance, most of the existing methods leverage channel attention (e.g. squeeze and excitation), which despite its strong impact on the model performance, operates only on the channel space and ignores the spatial space. In this work, we design a generic and collaborative excitation module, namely the Collaborative Positional-Motion Excitation Module (CPME) for action recognition. CPME is a dual-pathway excitation module designed to embed the crucial types of information, mainly the positional information and the motion information, for efficient action recognition. Positional Enhancement Pathway (PEP), the first pathway of CPME, considers encoding direction-aware and position-sensitive information. Motion Enhancement Pathway (MEP), the second pathway, encodes the motion information by emphasizing the informative features in each frame and excite motion-sensitive channels. We integrate the proposed CPME into 2D CNNs to form a simple yet effective CPME-Net with limited extra computational cost. Finally, a discriminative and diverse video-level representation for action recognition is generated by end-to-end training. Experiments on two popular action recognition datasets demonstrate that CPME blocks bring performance improvements on 2D CNN baseline, and our method achieves competitive results against the state-of-the-art methods.
What problem does this paper attempt to address?