Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

Hongda Liu,Yunfan Liu,Min Ren,Hao Wang,Yunlong Wang,Zhenan Sun
2024-11-28
Abstract:In skeleton-based action recognition, a key challenge is distinguishing between actions with similar trajectories of joints due to the lack of image-level details in skeletal representations. Recognizing that the differentiation of similar actions relies on subtle motion details in specific body parts, we direct our approach to focus on the fine-grained motion of local skeleton components. To this end, we introduce ProtoGCN, a Graph Convolutional Network (GCN)-based model that breaks down the dynamics of entire skeleton sequences into a combination of learnable prototypes representing core motion patterns of action units. By contrasting the reconstruction of prototypes, ProtoGCN can effectively identify and enhance the discriminative representation of similar actions. Without bells and whistles, ProtoGCN achieves state-of-the-art performance on multiple benchmark datasets, including NTU RGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM, which demonstrates the effectiveness of the proposed method. The code is available at <a class="link-external link-https" href="https://github.com/firework8/ProtoGCN" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to distinguish actions with similar trajectories in skeleton - based action recognition. Since the skeleton representation lacks image - level details, existing methods have difficulties in capturing the subtle movement details of specific body parts, and these details are crucial for distinguishing similar actions. Specifically, the paper points out that existing methods are difficult to capture the fine - grained details of key body parts, resulting in poor performance in distinguishing similar actions. For this reason, the authors propose the ProtoGCN model, aiming to effectively identify and enhance the discriminative representation of similar actions by decomposing the dynamics of the entire skeleton sequence and combining them into learned prototypes representing the core movement patterns of action units.