Spatial-Temporal Adaptive Metric Learning Network for One-Shot Skeleton-Based Action Recognition

Xuanfeng Li,Jian Lu,Xiaogai Chen,Xiaodan Zhang
DOI: https://doi.org/10.1109/lsp.2024.3351070
2024-02-02
IEEE Signal Processing Letters
Abstract:To fully leverage the labeled action data, one-shot learning solutions are designed for skeleton-based human action recognition recently. These solutions tend to employ deep metric learning methods to enlarge inter-class distances while suppressing intra-class variations in the embedding space. However, a notable issue is that actions even from the same class may exhibit different characteristics since the individual differences among performers. Consequently, suppressing intra-class variations may hinder the model's ability to generalize to unseen classes. To alleviate this problem, we propose a spatial-temporal adaptive metric learning network. At its core, our method can learn a pair of sub-embeddings, one of which attends to the spatial information and one focuses on the temporal information, with the aim of increasing the diversity of embeddings and preserving the differences of intra-class. Moreover, to meet the varying degrees of spatial and temporal information requirements for different samples, an adaptive weight assignment module is designed for allocating attention weights to each sub-embedding. Experiments on the NTU-RGB+D 120 dataset indicate that our method provides a stronger embedding space than the other state-of-the-art methods.
engineering, electrical & electronic
What problem does this paper attempt to address?