Abstract:Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition, leveraging their ability to unravel the complex dynamics of human joint topology through the graph's adjacency matrix. However, an inherent flaw has come to light in these cutting-edge models: they tend to optimize the adjacency matrix jointly with the model weights. This process, while seemingly efficient, causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map. As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances. This approach preserves the vital topological nuances often lost in conventional GCNs. (2) We highlight an oft-overlooked feature - the temporal mean of a skeletal sequence, which, despite its modest guise, carries highly action-specific information. (3) Our investigation revealed strong variations in joint-to-joint relationships across different actions. This finding exposes the limitations of a single adjacency matrix in capturing the variations of relational configurations emblematic of human movement, which we remedy by proposing an efficient refinement to Graph Convolutions (GC) - the BlockGC. This evolution slashes parameters by a substantial margin (above 40%), while elevating performance beyond original GCNs. Our full model, the BlockGCN, establishes new standards in skeleton-based action recognition for small model sizes. Its high accuracy, notably on the large-scale NTU RGB+D 120 dataset, stand as compelling proof of the efficacy of BlockGCN.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the problem that in the existing skeleton - based action recognition methods, Graph Convolutional Networks (GCNs) gradually lose the skeletal topological structure information during the learning process. Specifically, the paper points out that although GCNs can capture the complex dynamic relationships between human joints through the adjacency matrix of the graph, these models will adjust the adjacency matrix and model weights simultaneously during the optimization process, resulting in the gradual degradation of skeletal connection data, and finally making the model insensitive to the topological structure to be originally mapped. This phenomenon is called "Catastrophic Forgetting". In addition, the existing GCNs also have deficiencies in dealing with the changes in the relationships between joints in different actions, because a single adjacency matrix is difficult to capture the changes in the relationship configurations in human movements. To overcome these problems, the paper proposes strategies in three aspects: 1. **Redefine skeletal connectivity**: By leveraging the power of graph distance, a new path is proposed to encode skeletal connectivity, which can preserve important topological details that are often overlooked in traditional GCNs. 2. **Emphasize the time - mean feature of skeletal sequences**: Although this feature seems simple, it carries highly action - specific information. 3. **In view of the strong variability of the relationships between joints in different actions, BlockGC is proposed**: This is an effective improvement to graph convolution. It solves the problem of multi - relationship modeling by reducing the number of parameters (more than 40%) while improving performance. Through these three innovations, the model BlockGCN proposed in the paper sets a new standard in small model sizes, especially showing high precision on the large - scale NTU RGB + D 120 dataset, which proves the effectiveness of BlockGCN.

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

Part-Wise Adaptive Topology Graph Convolutional Network for Skeleton-Based Action Recognition

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition.

Effective skeleton topology and semantics-guided adaptive graph convolution network for action recognition

Topological Symmetry Enhanced Graph Convolution for Skeleton-Based Action Recognition

Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections

Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks

Attention-Guided and Topology-Enhanced Shift Graph Convolutional Network for Skeleton-Based Action Recognition

Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

Dynamic spatial-temporal topology graph network for skeleton-based action recognition

Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition

Attentional weighting strategy-based dynamic GCN for skeleton-based action recognition

A New Adjacency Matrix Configuration in GCN-based Models for Skeleton-based Action Recognition

Accommodating Self-attentional Heterophily Topology into High- and Low-pass Graph Convolutional Network for Skeleton-based Action Recognition

Attention adjacency matrix based graph convolutional networks for skeleton-based action recognition

Richly Activated Graph Convolutional Network for Robust Skeleton-Based Action Recognition

Improved Graph Convolutional Network with Enriched Graph Topology Representation for Skeleton-Based Action Recognition

Optimized Skeleton-based Action Recognition via Sparsified Graph Regression

An overview of Graph Convolutional Networks in skeleton-based action recognition

Multi-Stage Attention-Enhanced Sparse Graph Convolutional Network for Skeleton-Based Action Recognition

Feature reconstruction graph convolutional network for skeleton-based action recognition