Dynamic spatial-temporal topology graph network for skeleton-based action recognition

Lian Chen,Ke Lu,Zehai Niu,Runchen Wei,Jian Xue
DOI: https://doi.org/10.1007/s00530-024-01531-5
IF: 3.9
2024-10-30
Multimedia Systems
Abstract:Over the past few years, skeleton-based action recognition has gained significant attention for its simple yet robust representation of the human body structure. Many researchers have employed Graph Convolutional Network (GCN) to explore discriminative features of skeletons, achieving notable success. Nevertheless, conventional GCNs may be unable to efficiently capture joint dependencies in spatial-temporal dimensions due to their shared topology for different frames and channels, which are crucial for action understanding. To address this issue, we propose a Dynamic Spatial-Temporal Topology Graph Network (DST-GNet), which generates distinct topologies for different frames and channels, exploring abundant joint correlations of action sequences. Specifically, a data-driven Temporal Topology Matrix Learning (TTML) module is designed to capture intrinsic body correlations in each frame, boosting the model's generalization and flexibility. A Temporal-Specific Aggregation (TSA) module is also proposed for temporal-specific feature aggregation. Furthermore, we introduce an optimized Multi-Level Temporal Aggregation (MLTA) module to aggregate motion information along the temporal dimension, reducing the model complexity significantly. Experimental results demonstrate that our DST-GNet achieves comparable performance on three large-scale 3D human action datasets: NTU-RGB+D, NTU-RGB+D 120 and Northwestern-UCLA.
computer science, information systems, theory & methods
What problem does this paper attempt to address?