Abstract:The overall pipeline of our proposed method. The skeleton data is first input into RGCN to obtain basic feature expressions. RGCN can learn more spatial motion information of actions. Features with different temporal resolutions are then modulated in the temporal and spatial dimensions and aggregated into features with rich discriminative temporal information for final classification. Skeleton‐based human action recognition is gaining significant attention and finding widespread application in various fields, such as virtual reality and human‐computer interaction systems. Recent studies have highlighted the effectiveness of graph convolutional network (GCN) based methods in this task, leading to a remarkable improvement in prediction accuracy. However, most GCN‐based methods overlook the varying contributions of self, centripetal and centrifugal subsets. Besides, only a single‐scale temporal feature is adopted, and the multi‐temporal scale information is ignored. To this end, firstly, in order to differentiate the importance of different skeleton subsets, we develop a refinement graph convolution, which can adaptively learn a weight for each subset feature. Secondly, a multi‐temporal scale aggregation module is proposed to extract more discriminative temporal dynamic information. Furthermore, a multi‐temporal scale aggregation refinement graph convolutional network (MTSA‐RGCN) is proposed, and four‐stream structure is also adopted in this paper, which can comprehensively model complementary features and eventually achieves a significant performance boost. In the empirical experiments, the performance of our approach has been greatly improved on both NTU‐RGB+D 60 and NTU‐RGB+D 120 datasets, compared to other state‐of‐the‐art methods.

3D Skeleton Based Action Recognition by Video-Domain Translation-Scale Invariant Mapping and Multi-Scale Dilated CNN

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

Action Recognition with Domain Invariant Features of Skeleton Image

3D Action Recognition Using Multi-Temporal Skeleton Visualization.

Skeleton-based Action Recognition Using LSTM and CNN

3D Action Recognition Using Data Visualization and Convolutional Neural Networks.

End-to-end Learning of Deep Convolutional Neural Network for 3D Human Action Recognition

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

Investigation of Different Skeleton Features for CNN-based 3D Action Recognition

Channel attention and multi-scale graph neural networks for skeleton-based action recognition

Skeleton-Based Human Action Recognition Using Spatial Temporal 3D Convolutional Neural Networks

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Online Robust Action Recognition Based on a Hierarchical Model

Shifting Perspective to See Difference: A Novel Multi-View Method for Skeleton Based Action Recognition

Multi-Scale Adaptive Graph Convolution Network for Skeleton-Based Action Recognition

Skeleton-based Human Action Recognition via Convolutional Neural Networks (CNN)

One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching

Spectral studies on metal-ligand bonding of novel rhodanine azodye sulphadrugs.

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

Multi‐temporal scale aggregation refinement graph convolutional network for skeleton‐based action recognition