Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition

Zhiyun Zheng,Yizhou Wang,Xingjin Zhang,Junfeng Wang
DOI: https://doi.org/10.3390/app12031402
2022-01-01
Applied Sciences
Abstract:Skeleton-based human action recognition has attracted extensive attention due to the robustness of the human skeleton data in the field of computer vision. In recent years, there is a trend of using graph convolutional networks (GCNs) to model the human skeleton into a spatio-temporal graph to explore the internal connections of human joints that has achieved remarkable performance. However, the existing methods always ignore the remote dependency between joints, and fixed temporal convolution kernels will lead to inflexible temporal modeling. In this paper, we propose a multi-scale adaptive aggregate graph convolution network (MSAAGCN) for skeleton-based action recognition. First, we designed a multi-scale spatial GCN to aggregate the remote and multi-order semantic information of the skeleton data and comprehensively model the internal relations of the human body for feature learning. Then, the multi-scale temporal module adaptively selects convolution kernels of different temporal lengths to obtain a more flexible temporal map. Additionally, the attention mechanism is added to obtain more meaningful joint, frame and channel information in the skeleton sequence. Extensive experiments on three large-scale datasets (NTU RGB+D 60, NTU RGB+D 120 and Kinetics-Skeleton) demonstrate the superiority of our proposed MSAAGCN.
What problem does this paper attempt to address?