Abstract:The overall pipeline of our proposed method. The skeleton data is first input into RGCN to obtain basic feature expressions. RGCN can learn more spatial motion information of actions. Features with different temporal resolutions are then modulated in the temporal and spatial dimensions and aggregated into features with rich discriminative temporal information for final classification. Skeleton‐based human action recognition is gaining significant attention and finding widespread application in various fields, such as virtual reality and human‐computer interaction systems. Recent studies have highlighted the effectiveness of graph convolutional network (GCN) based methods in this task, leading to a remarkable improvement in prediction accuracy. However, most GCN‐based methods overlook the varying contributions of self, centripetal and centrifugal subsets. Besides, only a single‐scale temporal feature is adopted, and the multi‐temporal scale information is ignored. To this end, firstly, in order to differentiate the importance of different skeleton subsets, we develop a refinement graph convolution, which can adaptively learn a weight for each subset feature. Secondly, a multi‐temporal scale aggregation module is proposed to extract more discriminative temporal dynamic information. Furthermore, a multi‐temporal scale aggregation refinement graph convolutional network (MTSA‐RGCN) is proposed, and four‐stream structure is also adopted in this paper, which can comprehensively model complementary features and eventually achieves a significant performance boost. In the empirical experiments, the performance of our approach has been greatly improved on both NTU‐RGB+D 60 and NTU‐RGB+D 120 datasets, compared to other state‐of‐the‐art methods.

A Spatiotemporal Fusion Network for Skeleton-Based Action Recognition

Symmetrical Enhanced Fusion Network for Skeleton-Based Action Recognition

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition.

Joints-Centered Spatial-Temporal Features Fused Skeleton Convolution Network for Action Recognition

Hybrid Features for Skeleton-Based Action Recognition Based on Network Fusion.

Triplet Attention Multiple Spacetime-Semantic Graph Convolutional Network for Skeleton-Based Action Recognition

Densely Connected and Multiple Temporal Graph Convolution Networks for Skeleton-based Action Recognition

Self-Relational Graph Convolution Network for Skeleton-Based Action Recognition

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

Temporal Refinement Graph Convolutional Network for Skeleton-based Action Recognition

Adaptive Spatiotemporal Graph Convolutional Network with Intermediate Aggregation of Multi-Stream Skeleton Features for Action Recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Glimpse and Zoom: Spatio-Temporal Focused Dynamic Network for Skeleton-based Action Recognition

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

A Spatio-temporal Hybrid Network for Action Recognition

Skeleton-based Action Recognition with Multi-stream, Temporal-Channel Enhanced Graph Convolution Network

A Graph Skeleton Transformer Network for Action Recognition

TFC-GCN: Lightweight Temporal Feature Cross-Extraction Graph Convolutional Network for Skeleton-Based Action Recognition

Multi‐temporal scale aggregation refinement graph convolutional network for skeleton‐based action recognition

Temporal Enhanced Multi-Stream Graph Convolutional Nerual Networks For Skeleton-Based Action Recognition