Abstract:Skeleton representation has attracted a great deal of attention recently as an extremely robust feature for human action recognition. However, its non-Euclidean structural characteristics raise new challenges for conventional solutions. Recent studies have shown that there is a native superiority in modeling spatiotemporal skeleton information with a Graph Convolutional Network (GCN). Nevertheless, the skeleton graph modeling normally focuses on the physical adjacency of the elements of the human skeleton sequence, which contrasts with the requirement to provide a perceptually meaningful representation. To address this problem, in this paper, we propose a perceptually-enriched graph learning method by introducing innovative features to spatial and temporal skeleton graph modeling. For the spatial information modeling, we incorporate a Local-Global Graph Convolutional Network (LG-GCN) that builds a multifaceted spatial perceptual representation. This helps to overcome the limitations caused by over-reliance on the spatial adjacency relationships in the skeleton. For temporal modeling, we present a Region-Aware Graph Convolutional Network (RA-GCN), which directly embeds the regional relationships conveyed by a skeleton sequence into a temporal graph model. This innovation mitigates the deficiency of the original skeleton graph models. In addition, we strengthened the ability of the proposed channel modeling methods to extract multi-scale representations. These innovations result in a lightweight graph convolutional model, referred to as Graph2Net, that simultaneously extends the spatial and temporal perceptual fields, and thus enhances the capacity of the graph model to represent skeleton sequences. We conduct extensive experiments on NTU-RGB+D 60&120, Northwestern-UCLA, and Kinetics-400 datasets to show that our results surpass the performance of several mainstream methods while limiting the model complexity and computational ov-rhead.

Overcomplete graph convolutional denoising autoencoder for noisy skeleton action recognition

Predictively Encoded Graph Convolutional Network for Noise-Robust Skeleton-based Action Recognition

Richly Activated Graph Convolutional Network for Robust Skeleton-Based Action Recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Prompt-supervised dynamic attention graph convolutional network for skeleton-based action recognition

DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition

Generalized Graph Convolutional Networks for Skeleton-based Action Recognition

Multidimensional Refinement Graph Convolutional Network With Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition

Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition

Feedback Graph Convolutional Network for Skeleton-Based Action Recognition

Optimized Skeleton-based Action Recognition via Sparsified Graph Regression

Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition

Forward-reverse adaptive graph convolutional networks for skeleton-based action recognition

Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks

Skeleton action recognition via graph convolutional network with self-attention module

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition.

Feature reconstruction graph convolutional network for skeleton-based action recognition

Stronger, Faster and More Explainable: A Graph Convolutional Baseline for Skeleton-based Action Recognition

Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition

Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition