Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

Yuxuan Zhou,Zhi-Qi Cheng,Jun-Yan He,Bin Luo,Yifeng Geng,Xuansong Xie
2024-03-04
Abstract:Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition, leveraging their ability to unravel the complex dynamics of human joint topology through the graph's adjacency matrix. However, an inherent flaw has come to light in these cutting-edge models: they tend to optimize the adjacency matrix jointly with the model weights. This process, while seemingly efficient, causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map. As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances. This approach preserves the vital topological nuances often lost in conventional GCNs. (2) We highlight an oft-overlooked feature - the temporal mean of a skeletal sequence, which, despite its modest guise, carries highly action-specific information. (3) Our investigation revealed strong variations in joint-to-joint relationships across different actions. This finding exposes the limitations of a single adjacency matrix in capturing the variations of relational configurations emblematic of human movement, which we remedy by proposing an efficient refinement to Graph Convolutions (GC) - the BlockGC. This evolution slashes parameters by a substantial margin (above 40%), while elevating performance beyond original GCNs. Our full model, the BlockGCN, establishes new standards in skeleton-based action recognition for small model sizes. Its high accuracy, notably on the large-scale NTU RGB+D 120 dataset, stand as compelling proof of the efficacy of BlockGCN.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problem that in the existing skeleton - based action recognition methods, Graph Convolutional Networks (GCNs) gradually lose the skeletal topological structure information during the learning process. Specifically, the paper points out that although GCNs can capture the complex dynamic relationships between human joints through the adjacency matrix of the graph, these models will adjust the adjacency matrix and model weights simultaneously during the optimization process, resulting in the gradual degradation of skeletal connection data, and finally making the model insensitive to the topological structure to be originally mapped. This phenomenon is called "Catastrophic Forgetting". In addition, the existing GCNs also have deficiencies in dealing with the changes in the relationships between joints in different actions, because a single adjacency matrix is difficult to capture the changes in the relationship configurations in human movements. To overcome these problems, the paper proposes strategies in three aspects: 1. **Redefine skeletal connectivity**: By leveraging the power of graph distance, a new path is proposed to encode skeletal connectivity, which can preserve important topological details that are often overlooked in traditional GCNs. 2. **Emphasize the time - mean feature of skeletal sequences**: Although this feature seems simple, it carries highly action - specific information. 3. **In view of the strong variability of the relationships between joints in different actions, BlockGC is proposed**: This is an effective improvement to graph convolution. It solves the problem of multi - relationship modeling by reducing the number of parameters (more than 40%) while improving performance. Through these three innovations, the model BlockGCN proposed in the paper sets a new standard in small model sizes, especially showing high precision on the large - scale NTU RGB + D 120 dataset, which proves the effectiveness of BlockGCN.