Action capsules: Human skeleton action recognition

Ali Farajzadeh Bavil,Hamed Damirchi,Hamid D. Taghirad
DOI: https://doi.org/10.1016/j.cviu.2023.103722
IF: 4.886
2023-05-17
Computer Vision and Image Understanding
Abstract:Due to the compact and rich high-level representations offered, skeleton-based human action recognition has recently gained more attraction. Although joint relationships investigation in spatial and temporal dimensions provides effective information critical to action recognition, effectively encoding global dependencies of joints during spatio-temporal feature extraction is a prohibitive task. In this paper, we introduce Action Capsule which identifies action-related key joints by considering the latent correlation of joints in a skeleton sequence. We show that, during inference, our end-to-end network pays attention to a set of joints specific to each action, whose encoded spatio-temporal features are aggregated to recognize the action. Additionally, the use of multiple stages of action capsules enhances the ability of the network to classify similar actions. A comparative analysis of our capsule-based approach with other widely-used methods in skeleton action recognition is given, highlighting the advantages of the proposed approach in handling missing skeleton data by leveraging iterative processing. Consequently, our network outperforms the state-of-the-art approaches on the N-UCLA dataset and obtains competitive results on the NTURGBD dataset. This is while our approach has significantly lower computational requirements based on GFLOPs measurements.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?