Improved skeleton-based activity recognition using convolutional block attention module
Jing Qin,Shugang Zhang,Yiguo Wang,Fei Yang,Xin Zhong,Weigang Lu
DOI: https://doi.org/10.1016/j.compeleceng.2024.109231
IF: 4.152
2024-04-06
Computers & Electrical Engineering
Abstract:Inferring human activities from the skeletons extracted from activity photos or videos is a fundamental yet important issue in the research community of computer vision. Current skeleton-based activity recognition methods have accomplished this task by extracting features in either spatial or temporal perspective, or combining both views; however, the extracted features lack further after-processing like feature augmentation. Aiming at the above shortcomings, this article presents a new human activity recognition approach based on human skeleton data, named C onvolutional B lock A ttention M odule-based S patial T emporal C onvolutional N etwork (CBAM-STGCN). To enhance the discriminative validity of the generated features, we leverage the attention mechanism to weight feature units within each dimension by introducing two attention modules designed in terms of different aspects into the model architecture. Specifically, the original feature map of an activity sample is adaptively refined by the convolutional block attention module, where two separate attention maps obtained with regard to spatial and channel dimensions are adopted to generate the corresponding attended feature maps. Evaluation results on the Kinetics benchmark dataset showed that the proposed model has improved 1.76 % and 2.43 % according to top-1 and top-5 metrics, respectively, compared to the conventional baseline model. Further ablation experiments proved that both channel and spatial attention modules contribute to the model performance.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture