Graph-aware transformer for skeleton-based action recognition

Jiaxu Zhang,Wei Xie,Chao Wang,Ruide Tu,Zhigang Tu
DOI: https://doi.org/10.1007/s00371-022-02603-1
IF: 2.835
2022-07-26
The Visual Computer
Abstract:Recently, graph convolutional networks (GCNs) play a critical role in skeleton-based human action recognition. However, most GCN-based methods still have two main limitations: (1) The semantic-level adjacency matrix of the skeleton graph is difficult to be manually defined, which restricts the perception field of GCN and limits its ability to extract the spatial–temporal features. (2) The velocity information of human body joints cannot be efficiently used and fully exploited by GCN, because GCN does not represent the correlation between the velocity vectors explicitly. To address these issues, we propose a graph-aware transformer (GAT), which can make full use of the velocity information and learn discriminative spatial–temporal motion features from the sequence of the skeleton graphs in a data-driven way. Besides, similar to the GCN-based model, our GAT also considers the prior structures of the human body including the link-aware structure and the part-aware structure. Extensive experiments on three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics-Skeleton, demonstrated that the proposed GAT obtains significant improvement compared to the GCN-based baseline for skeleton action recognition.
computer science, software engineering
What problem does this paper attempt to address?