Spatio‐temporal attention modules in orientation‐magnitude‐response guided multi‐stream CNNs for human action recognition

Fatemeh Khezerlou,Aryaz Baradarani,Mohammad Ali Balafar,Roman Gr. Maev
DOI: https://doi.org/10.1049/ipr2.13104
IF: 2.3
2024-04-24
IET Image Processing
Abstract:This paper, presents a multi‐stream CNN‐based model that leverages various attention modules to analyze multimodal data obtained from video cameras and Kinect sensors. A novel motion encoding technique, orientation‐magnitude response maps, is introduced to effectively capture the historical motion of actions, representing long‐frame sequences as a single motion history image. This paper introduces a new descriptor called orientation‐magnitude response maps as a single 2D image to effectively explore motion patterns. Moreover, boosted multi‐stream CNN‐based model with various attention modules is designed for human action recognition. The model incorporates a convolutional self‐attention autoencoder to represent compressed and high‐level motion features. Sequential convolutional self‐attention modules are used to exploit the implicit relationships within motion patterns. Furthermore, 2D discrete wavelet transform is employed to decompose RGB frames into discriminative coefficients, providing supplementary spatial information related to the actors actions. A spatial attention block, implemented through the weighted inception module in a CNN‐based structure, is designed to weigh the multi‐scale neighbours of various image patches. Moreover, local and global body pose features are combined by extracting informative joints based on geometry features and joint trajectories in 3D space. To provide the importance of specific channels in pose descriptors, a multi‐scale channel attention module is proposed. For each data modality, a boosted CNN‐based model is designed, and the action predictions from different streams are seamlessly integrated. The effectiveness of the proposed model is evaluated across multiple datasets, including HMDB51, UTD‐MHAD, and MSR‐daily activity, showcasing its potential in the field of action recognition.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?