MKTZ: multi-semantic embedding and key frame masking techniques for zero-shot skeleton action recognition

Hongwei Chen,Sheng Guo,Zexi Chen
DOI: https://doi.org/10.1007/s00530-024-01592-6
IF: 3.9
2024-12-05
Multimedia Systems
Abstract:The fundamental task of zero-shot skeleton-based action recognition is to learn existing skeletal actions during the training phase and to accurately identify unseen actions during the inference phase. The key challenge lies in effectively unifying skeletal features and semantic features. Traditional zero-shot skeleton recognition methods often emphasize the distributed alignment between skeletal and semantic information, thereby neglecting the crucial role that rich semantic and action information plays in guiding skeleton-based action recognition. To enable the model to better focus on semantic features, this paper proposes a Multi-Semantic Embedding (MSE) module. This method aims to more clearly distinguish between different semantic information by using various textual descriptions as semantic guidance, thus improving the efficiency of skeleton feature learning. Additionally, to further enhance the model's generalization ability, this paper introduces a Key Frame Masking (KFM) module. The KFM module selects key frames by calculating the unidirectional motion between each frame in the skeletal sequence. By masking these key frames, it generates sequence samples with information loss, which are then compared with the negative samples. To enable the model to better focus on semantic features and reduce its reliance on the information inherent in the skeletal sequence's key frames. The proposed method was evaluated on three skeletal datasets, achieving an accuracy of 83.34% on the NTU-RGB+D 60 dataset, 57.60% on the NTU-RGB+D 120 dataset, and 78.62% on the PKU-MMD dataset. The experimental results validate the effectiveness of the proposed method in enhancing zero-shot skeleton action recognition.
computer science, information systems, theory & methods
What problem does this paper attempt to address?