Abstract:Action recognition is essential for many human-centered applications in the Internet of Things (IoT). Especially, in the Internet of Medical Things (IoMT), action recognition shows great importance in surgical assistance, patient monitoring, etc. Recently, 3-D skeleton sequence-based action recognition draws broad attention. It is a challenging task that needs effective modeling on intraframe skeleton representations and interframe temporal dynamics. Standard long short-term memory (LSTM)-based models are widely used for sequence modeling due to its long-term memory, yet they are unable to fully model the relationship between different body joints or persons to extract crucial co-occurrence features from different levels. To handle this shortcoming, we propose an attention-based multilevel co-occurrence graph convolutional LSTM (AMCGC-LSTM). By integrating graph convolutional networks (GCNs) into LSTM, the proposed model is capable of leveraging body structural information from skeletons and strengthening the multilevel co-occurrence (MC) feature learning. Specifically, we first design the spatial attention module for feature enhancement of key joints from skeleton inputs. Second, we design MC memory units coupled with GCN to automatically model the spatial relationship between joints, and simultaneously capture the co-occurrence features from different joints, persons, and frames. Finally, we construct aggregated features of MCs (AFMCs) from MC memory units to better represent the intraframe action context encoding, and leverage a concurrent LSTM (Co-LSTM) to further model their temporal dynamics for action recognition. Our model significantly outperforms mainstream methods on NTU RGB+D 60/120 data set, mutual action subset of NTU RGB+D 60/120 data set, and Northewestern-UCLA data set.

First-Person Hand Action Recognition Using Multimodal Data

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

On the Utility of 3D Hand Poses for Action Recognition

Handcrafted Vs. Learned Representations for Human Action Recognition

Reassessing Hierarchical Representation for Action Recognition in Still Images

Human Action Recognition: Pose-based Attention draws focus to Hands

Human Action Recognition Based on Three-Stream Network with Frame Sequence Features

Combining ConvNets with Hand-Crafted Features for Action Recognition Based on an HMM-SVM Classifier

Hand-Object Interaction and Precise Localization in Transitive Action Recognition

Pose-conditioned Spatio-Temporal Attention for Human Action Recognition

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Exploiting deep residual networks for human action recognition from skeletal data

The diffusion coefficient of a swollen microgel particle.

A Two-stream Neural Network for Pose-based Hand Gesture Recognition

RGB-D Based Action Recognition with Light-weight 3D Convolutional Networks

Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition

Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition

In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition

Human-centric multimodal fusion network for robust action recognition

Action Recognition In Rgb-D Egocentric Videos