A Cuboid CNN Model with an Attention Mechanism for Skeleton-Based Action Recognition

Kaijun Zhu,Ruxin Wang,Qingsong Zhao,Jun Cheng,Dapeng Tao
DOI: https://doi.org/10.1109/tmm.2019.2962304
IF: 7.3
2020-01-01
IEEE Transactions on Multimedia
Abstract:The introduction of depth sensors such as Microsoft Kinect have driven research in human action recognition. Human skeletal data collected from depth sensors convey a significant amount of information for action recognition. While there has been considerable progress in action recognition, most existing skeleton-based approaches neglect the fact that not all human body parts move during many actions, and they fail to consider the ordinal positions of body joints. Here, and motivated by the fact that an action's category is determined by local joint movements, we propose a cuboid model for skeleton-based action recognition. Specifically, a cuboid arranging strategy is developed to organize the pairwise displacements between all body joints to obtain a cuboid action representation. Such a representation is well structured and allows deep CNN models to focus analyses on actions. Moreover, an attention mechanism is exploited in the deep model, such that the most relevant features are extracted. Extensive experiments on our new Yunnan University-Chinese Academy of Sciences-Multimodal Human Action Dataset (CAS-YNU MHAD), the NTU RGB+D dataset, the UTD-MHAD dataset, and the UTKinect-Action3D dataset demonstrate the effectiveness of our method compared to the current state-of-the-art.
What problem does this paper attempt to address?