MKE-GCN: Multi-Modal Knowledge Embedded Graph Convolutional Network for Skeleton-Based Action Recognition in the Wild

Sen Yang,Xuanhan Wang,Lianli Gao,Jingkuan Song
DOI: https://doi.org/10.1109/icme52920.2022.9859787
2022-01-01
Abstract:The graph convolutional networks (GCNs), which model human body skeletons as several spatial-temporal graphs, have been widely used and become a key to representative feature extraction. However, existing methods have limitations in recognizing action in the wild, where human body skeletons are captured from real-world scenes with diversified view-points, obvious motion blurs, complex interactions and fast varying resolutions of the human body. In this paper, we propose a Multi-modal Knowledge Embedded Graph Convolutional Network (MKE-GCN), which is a conceptually simple yet effective method for skeleton-based action recognition in the wild. In the proposed framework, we address two main problems: 1) how to design a simple yet effective pipeline for modeling multi-modal body skeletons; and 2) how to equip this pipeline with the ability of handling “in the wild”. To tackle these problems, in MKE-GCN, we first build an adaptive multi-modal aggregation (AMA) module and add it to traditional GCNs for multi-modal representation learning. Then, we further enhance the GCN model by a multi-modal knowledge distillation (MKD) strategy, where the proposed MKE-GCN mines action recognition knowledge from various multi-modal models. We discover that aside from the multi-modal representation, the MKD is of particular importance for improving the accuracy of skeleton-based action recognition “in the wild”. Notably, the proposed method is light-weight, which can be applied to any GCN based method. Furthermore, extensive experiments on three challenging benchmarks, e.g., UAV-Human, NTU-RGB+D 60 and NTU-RGB+D 120, demonstrate that our approach sets a new record for skeleton-based action recognition. Our anonymous code and models are also released 1 .
What problem does this paper attempt to address?