Multi-Scale Spatial Temporal Graph Neural Network for Skeleton-Based Action Recognition

Dong Feng,Zhongcheng Wu,Jun Zhang,Tingting Ren
DOI: https://doi.org/10.1109/ACCESS.2021.3073107
IF: 3.9
2021-01-01
IEEE Access
Abstract:Graph convolutional networks (GCNs) have achieved remarkable performance on skeleton-based action recognition. Existing GCN-based methods usually apply the fixed graph topology and one fixed temporal convolution kernel to extract the spatial features of joints and temporal features, which is from a single-scale perspective. Actually, human actions are coordinated by various body parts in the spatial domain, and exhibit different characteristics in the temporal domain. Therefore, it is appropriate to model the multi-scale information that can enhance both the explainability and stability, which is ignored in current literatures. To address this issue, we propose a multi-scale spatial-temporal graph neural network (MSTGNN) to discover multi-scale discriminative features from spatial and temporal aspects simultaneously. Our contributions are three-folds: 1) For the spatial domain, inspired by the kinematics of the human action, we develop a three-scale graph data structures in a fine-to-coarse way. A novel hybrid spatial pooling module is then proposed to dynamically exploit the global and comprehensive information step-by-step. 2) For the temporal domain, we design a multi-scale temporal convolution module adaptively fusing the temporal features extracted by different scale convolution kernels. 3) As utilizing one-stream architecture instead of multi-stream architecture, the proposed model can be trained in an end-to-end manner. MSTGNN achieves state-of-the-art performance with less computation complexity. Experimental results conducted on two large datasets (NTU-RGB+D and NTU-RGB+D-120) demonstrate the superiority of MSTGNN.
Computer Science
What problem does this paper attempt to address?