Abstract:<p>The skeletal data has been an alternative for the human action recognition task as it provides more compact and distinct information compared to the traditional RGB input. However, unlike the RGB input, the skeleton data lies in a non-Euclidean space that traditional deep learning methods are not able to use their fullest potential. Fortunately, with the emerging trend of Geometric deep learning, the spatial-temporal graph convolutional network (ST-GCN) has been proposed to deal with the action recognition problem from skeleton data. ST-GCN and its variants fit well with skeleton-based action recognition and are becoming the mainstream frameworks for this task. However, the efficiency and the performance of the task are hindered by either fixing the skeleton joint correlations or providing a computational expensive strategy to construct a dynamic topology for the skeleton. We argue that many of these operations are either unnecessary or even harmful for the task. By theoretically and experimentally analysing the state-of-the-art ST-GCNs, we provide a simple but efficient strategy to capture the global graph correlations and thus efficiently model the representation of the input graph sequences. Moreover, the global graph strategy also reduces the graph sequence into the Euclidean space, thus a multi-scale temporal filter is introduced to efficiently capture the dynamic information. With the method, we are not only able to better extract the graph correlations with much fewer parameters (only 12.6% of the current best), but we also achieve a superior performance. Extensive experiments on current largest 3D datasets, NTU-RGB+D and NTU-RGB+D 120, demonstrate the ability of our network to perform efficient and lightweight priority on this task.</p>

STGL-GCN: Spatial–Temporal Mixing of Global and Local Self-Attention Graph Convolutional Networks for Human Action Recognition

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

GLTA-GCN: Global-Local Temporal Attention Graph Convolutional Network for Unsupervised Skeleton-Based Action Recognition

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition

Temporal Enhanced Multi-Stream Graph Convolutional Nerual Networks For Skeleton-Based Action Recognition

SelfGCN: Graph Convolution Network With Self-Attention for Skeleton-Based Action Recognition

Spatial‐temporal Slowfast Graph Convolutional Network for Skeleton‐based Action Recognition

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Dynamic Semantic-Based Spatial-Temporal Graph Convolution Network for Skeleton-Based Human Action Recognition

Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition

Multilevel Spatial-Temporal Excited Graph Network for Skeleton-Based Action Recognition

Skeleton action recognition via graph convolutional network with self-attention module

Rethinking the ST-GCNs for 3D skeleton-based human action recognition

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition.

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Multi-Stage Attention-Enhanced Sparse Graph Convolutional Network for Skeleton-Based Action Recognition

Lightweight Multi-Scale Spatiotemporal Graph Convolutional Network for Skeleton-Based Action Recognition

Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition