Spatial-temporal hypergraph based on dual-stage attention network for multi-view data lightweight action recognition
Zhixuan Wu,Nan Ma,Cheng Wang,Cheng Xu,Genbao Xu,Mingxing Li
DOI: https://doi.org/10.1016/j.patcog.2024.110427
IF: 8
2024-03-22
Pattern Recognition
Abstract:For the problems of irrelevant frames and high model complexity in action recognition, we propose a Spatial–Temporal Hypergraph based on Dual-Stage Attention Network (STHG-DAN) for multi-view data lightweight action recognition. It includes two stages: Temporal Attention Mechanism based on Trainable Threshold (TAM-TT) and Hypergraph Convolution based on Dynamic Spatial–Temporal Attention Mechanism (HG-DSTAM). In the first stage, TAM-TT uses a learning threshold to extract keyframes from multi-view videos, with the multi-view data serving as a guarantee for providing more comprehensive information subsequently; In the second stage, HG-DSTAM divides the human joints into three parts: trunk, hand and leg to build spatial–temporal hypergraphs, extracts high-order features from spatial–temporal hypergraphs constructed of multi-view human body joints, inputs them into the dynamic spatial–temporal attention mechanism, and learns the intra frame correlation of multi-view data between the joint features of body parts, which can obtain the significant areas of action; We use multi-scale convolution operation and depth separable network, which can realize efficient action recognition with a few trainable parameters. We experiment on the NTU-RGB+D, NTU-RGB+D 120 and the imitating traffic police gesture dataset. The performance and accuracy of the model are better than the existing algorithms, effectively improving the machine and human body language interaction cognitive ability.
computer science, artificial intelligence,engineering, electrical & electronic