Abstract:Skeleton-based action recognition has achieved remarkable results in human action recognition with the development of graph convolutional networks (GCNs). However, the recent works tend to construct complex learning mechanisms with redundant training and exist a bottleneck for long time-series. To solve these problems, we propose the Temporal-Spatio Graph ConvNeXt (TSGCNeXt) to explore efficient learning mechanism of long temporal skeleton sequences. Firstly, a new graph learning mechanism with simple structure, Dynamic-Static Separate Multi-graph Convolution (DS-SMG) is proposed to aggregate features of multiple independent topological graphs and avoid the node information being ignored during dynamic convolution. Next, we construct a graph convolution training acceleration mechanism to optimize the back-propagation computing of dynamic graph learning with 55.08\% speed-up. Finally, the TSGCNeXt restructure the overall structure of GCN with three Spatio-temporal learning modules,efficiently modeling long temporal features. In comparison with existing previous methods on large-scale datasets NTU RGB+D 60 and 120, TSGCNeXt outperforms on single-stream networks. In addition, with the ema model introduced into the multi-stream fusion, TSGCNeXt achieves SOTA levels. On the cross-subject and cross-set of the NTU 120, accuracies reach 90.22% and 91.74%.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in skeleton - action recognition, the existing graph convolutional network (GCNs) methods have two main bottlenecks: First, the construction of complex learning mechanisms leads to low training efficiency; second, the ability to handle long - time - series data is insufficient. Specifically: 1. **Complex learning mechanisms**: Recent works tend to introduce complex dynamic graph learning mechanisms. Although these mechanisms can improve the performance of the model, they also add redundant training processes at the same time, reducing the training efficiency. Especially when dealing with long - time - series data, this inefficiency becomes more obvious. 2. **Bottlenecks in long - time - series learning**: In order to ensure the number of parameters and computational efficiency, existing methods usually reduce the length of the time series, which leads to the loss of fine - grained time information. In addition, some methods will have the problem of decreasing accuracy when learning long - time - series, limiting the model's ability to learn long - time - series data. In response to the above problems, the paper proposes **Temporal - Spatio Graph ConvNeXt (TSGCNeXt)**, aiming to solve these problems through the following improvements: - **New graph learning mechanism**: The **Dynamic - Static Separated Multi - Graph Convolution (DS - SMG)** module is proposed to aggregate the features of multiple independent topological graphs and avoid ignoring node information during the dynamic convolution process. - **Graph convolution training acceleration mechanism**: The back - propagation calculation of dynamic graph learning is optimized, and the training speed is increased by 55.08% compared with traditional methods. - **Overall structure optimization**: The overall structure of GCN is reconstructed, and three spatio - temporal learning modules are designed to effectively model long - time features. Through these improvements, TSGCNeXt not only performs excellently on single - stream networks, but also can reach the state - of - the - art level when multi - stream fusion. Experimental results show that the performance of TSGCNeXt on the large - scale datasets NTU RGB + D 60 and 120 is better than existing methods, especially in the processing of long - time - series data.

TSGCNeXt: Dynamic-Static Multi-Graph Convolution for Efficient Skeleton-Based Action Recognition with Long-term Learning Potential

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

Multi-Scale Spatial Temporal Graph Neural Network for Skeleton-Based Action Recognition

Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition.

Densely Connected and Multiple Temporal Graph Convolution Networks for Skeleton-based Action Recognition

DSDC-GCN: Decoupled Static-Dynamic Co-occurrence Graph Convolutional Networks for Skeleton-Based Action Recognition

Skeleton-based Action Recognition with Multi-stream, Temporal-Channel Enhanced Graph Convolution Network

Multi-Scale Adaptive Graph Convolution Network for Skeleton-Based Action Recognition

Temporal‐enhanced graph convolution network for skeleton‐based action recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Deformable graph convolutional transformer for skeleton-based action recognition

Temporal-Aware Graph Convolution Network for Skeleton-based Action Recognition.

Dynamic Semantic-Based Spatial-Temporal Graph Convolution Network for Skeleton-Based Human Action Recognition

SelfGCN: Graph Convolution Network With Self-Attention for Skeleton-Based Action Recognition

Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition

Temporal Enhanced Multi-Stream Graph Convolutional Nerual Networks For Skeleton-Based Action Recognition

Dynamic spatial-temporal topology graph network for skeleton-based action recognition

Temporal Graph Modeling for Skeleton-based Action Recognition

Multi-Stage Attention-Enhanced Sparse Graph Convolutional Network for Skeleton-Based Action Recognition