Skeleton-Based Action Recognition with Select-Assemble-Normalize Graph Convolutional Networks

Haoyu Tian,Xin Ma,Xiang Li,Yibin Li
DOI: https://doi.org/10.1109/tmm.2023.3318325
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Skeleton-based action recognition has been substantially driven by the development of artificial intelligence technology and deep sensors. Recently, graph convolutional networks (GCNs) have achieved excellent performances in skeleton-based action recognition. However, the performances of GCN-based methods are impaired by inappropriate node partitioning strategy and obstructed long-range information flow. To solve these issues, a novel Select-Assemble-Normalize Graph Convolution Network (SAN-GCN) is proposed to model the spatio-temporal features of skeleton. First, all skeleton joints are selected as root nodes, and the neighborhoods of the root joints are assembled and normalized according to the body structure, which explicitly and interpretably expresses the spatial geometry relation of the skeleton joints. Second, we propose an attention-based assembly and normalization strategy to adaptively capture non-local joints. The adaptive assembly and normalization can avoid the dilution of key long-range features. Moreover, a bi-level aggregation strategy is introduced to learn spatio-temporal dependencies of joints, where the low-level aggregation aligns the normalized neighborhood graphs, and the high-level aggregation aggregates the features of neighbor nodes by a standard convolution kernel. In high-level aggregation, it is convenient to realize factorized spatio-temporal aggregation or unified spatio-temporal aggregation. Extensive experiments on four datasets with different numbers of action patterns demonstrate that our model achieves comparable performance with the state-of-the-art works.
What problem does this paper attempt to address?