An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Yuling Xing,Jia Zhu,Yu Li,Jin Huang,Jinlong Song
DOI: https://doi.org/10.1007/s10489-022-03589-y
IF: 5.3
2022-06-14
Applied Intelligence
Abstract:Skeleton-based action recognition methods using complete human skeletons have achieved remarkable performance, but the performance of these methods could significantly deteriorate when critical joints or frames of the skeleton sequence are occluded or disrupted. However, the acquisition of incomplete and noisy human skeletons is inevitable in realistic environments. In order to strengthen the robustness of action recognition model, we propose an I mproved S patial T emporal G raph C onvolutional N etwork ( IST-GCN ) model, including three modules, namely Multi-dimension Adaptive Graph Convolutional Network (Md-AGCN), Enhanced Attention Mechanism (EAM) and Multi-Scale Temporal Convolutional Network (MS-TCN). Specifically, the Md-AGCN module can first adaptively adjust the graph structure according to different layers and the spatial dimension, temporal dimension, and channel dimension of different action samples to establish corresponding connections for long-range joints with dependencies. Then, the EAM module can focus on important information based on spatial domain, temporal domain and channel to further strengthen the dependencies between important joints. Finally, the MS-TCN module is used to enlarge the receptive field to extract more latent temporal dependencies. The comprehensive experiments on NTU-RGB+D and NTU-RGB+D 120 datasets demonstrate that our approach possesses outstanding performance in terms of both accuracy and robustness when skeleton samples are incomplete and noisy compared with the state-of-the-art (SOTA) approach. Moreover, the parameters and computational complexity of our model are far less than those of the existing approaches.
computer science, artificial intelligence
What problem does this paper attempt to address?