Temporal Decoupling Graph Convolutional Network for Skeleton-Based Gesture Recognition

Jinfu Liu,Xinshun Wang,Can Wang,Yuan Gao,Mengyuan Liu
DOI: https://doi.org/10.1109/tmm.2023.3271811
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Skeleton-based gesture recognition methods have achieved high success using Graph Convolutional Network (GCN), which commonly uses an adjacency matrix to model the spatial topology of skeletons. However, previous methods use the same adjacency matrix for skeletons from different frames, which limits the flexibility of GCN to model temporal information. To solve this problem, we propose a Temporal Decoupling Graph Convolutional Network (TD-GCN), which applies different adjacency matrices for skeletons from different frames. The main steps of each convolution layer in our proposed TD-GCN are as follows. To extract deep spatiotemporal information from skeleton joints, we first extract high-level spatiotemporal features from skeleton data. Then, channel-dependent and temporal-dependent adjacency matrices corresponding to different channels and frames are calculated to capture the spatiotemporal dependencies between skeleton joints. Finally, to fuse topology information from neighbor skeleton joints, spatiotemporal features of skeleton joints are fused based on channel-dependent and temporal-dependent adjacency matrices. To the best of our knowledge, we are the first to use temporal-dependent adjacency matrices for temporal-sensitive topology learning from skeleton joints. The proposed TD-GCN effectively improves the modeling ability of GCN and achieves state-of-the-art results on gesture datasets including SHREC'17 Track and DHG-14/28.
What problem does this paper attempt to address?