Abstract:The critical problem in skeleton-based action recognition is to extract high-level semantics from dynamic changes between skeleton joints. Therefore, Graph Convolutional Networks (GCNs) are widely applied to capture the spatial-temporal information of dynamic joint coordinates by graph-based convolution. However, previous GCNS with fixed graph convolution kernel are limited to the static topology of graphs and the geometric variations of actions. Moreover, the local information of adjacent nodes of the graph is aggregated layer by layer, which increases the model complexity. In this work, a Deformable Graph Convolutional Transformer (DGT) for skeleton-based action recognition is proposed to extract adaptive features via a flexible receptive field that is learnable. In our DGT model, a multiple-input-branches (MIB) architecture is adopted to obtain multiple information, such as joints, bones, and motions. The multiple features are fused in the Transformer Classifier. Then, the Spatial-Temporal Graph Convolution units (STGC) are used to learn a preliminary feature representation indicating both spatial and temporal dependencies on the graph. Next, a Deformable spatial-temporal compound attention backbone is followed, which learns to represent a robust feature via adaptive deformable skeleton features. The adaptive representation is obtained by dynamically adjusting its receptive field owing to the offset-based convolution method. In addition, a self-attention-based transformer classifier (TC) is designed to encode the sequence of features flattened on the spatial and temporal dimensions. The fully-connected attention mechanism further helps the high-level semantic representation by focusing on essential nodes in the graph. We evaluated DGT on two challenging large-scale datasets, NTU-RGBD 60 and NTU-RGBD 120. Experiment results support the efficacy of DGT to optimize the attention for different joints adaptively. A comparable performance but much more efficient than the state-of-the-art demonstrates the effectiveness of the proposed method.

Supplement Material of Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition

DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition

DD-GCN: Directed Diffusion Graph Convolutional Network for Skeleton-based Human Action Recognition

Research on decoupled adaptive graph convolution networks based on skeleton data for action recognition

Temporal Decoupling Graph Convolutional Network for Skeleton-Based Gesture Recognition

TSGCNeXt: Dynamic-Static Multi-Graph Convolution for Efficient Skeleton-Based Action Recognition with Long-term Learning Potential

On Dropping Clusters to Regularize Graph Convolutional Neural Networks

Optimized Skeleton-based Action Recognition via Sparsified Graph Regression

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

Rethinking the ST-GCNs for 3D skeleton-based human action recognition

Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition

Multidimensional Refinement Graph Convolutional Network With Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition

Dynamic spatial-temporal topology graph network for skeleton-based action recognition

Attentional weighting strategy-based dynamic GCN for skeleton-based action recognition

SpatioTemporal Focus for Skeleton-based Action Recognition

A Tri-Attention Enhanced Graph Convolutional Network for Skeleton-Based Action Recognition

Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition

Deformable graph convolutional transformer for skeleton-based action recognition

Graph transformer network with temporal kernel attention for skeleton-based action recognition

Feedback Graph Convolutional Network for Skeleton-Based Action Recognition

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness