Temporal Refinement Graph Convolutional Network for Skeleton-based Action Recognition
Tianming Zhuang,Zhen Qin,Yi Ding,Fuhu Deng,Leduo Chen,Zhiguang Qin,Kim-Kwang Raymond Choo
DOI: https://doi.org/10.1109/tai.2023.3329799
2024-01-01
IEEE Transactions on Artificial Intelligence
Abstract:Human skeleton data, which has served in the aspect of human activity recognition, ought to be the most representative biometric characteristics due to its intuitivity and visuality. The state-of-the-art approaches mainly focus on improving modeling spatial correlations within graph topologies. However, the interframes motional representations are also of vital importance, and we argue that they are worth paying attention to and exploring. Therefore, a temporal refinement module with contrastive learning mechanism is proposed, fusing as a complementary to the conventional spatial graph convolution layer. In addition, in order to further exploiting the inter-frame variances and generalizing GCN operation to temporal dimension, a temporal-correlation matrix is introduced to effectively capture dynamic dependencies within frame-pairs, enhancing semantic feature representation. Moreover, since GCN is a typical local operator which lacks of capability to fully model the long-term relations along spatial and temporal variation, to move beyond the limitation, a spatialtemporal cascaded aggregation module is designed to enlarge the receptive filter scale. The overall recognition framework consists of three above novelties, which is capable of achieving remarkable performance by evaluating on benchmark datasets(i.e., NTU RGB+D 60, NTU RGB+D 120, PKU-MMD and Kinetics Skeleton 400). Extensive experiments demonstrate the effectiveness of the proposed framework, e.g., performing recognition accuracy rate of 90.9% and 96.8% on NTU RGB+D 60, 87.9% and 88.9% on NTU RGB+D 120.