Abstract:Action recognition is essential for many human-centered applications in the Internet of Things (IoT). Especially, in the Internet of Medical Things (IoMT), action recognition shows great importance in surgical assistance, patient monitoring, etc. Recently, 3-D skeleton sequence-based action recognition draws broad attention. It is a challenging task that needs effective modeling on intraframe skeleton representations and interframe temporal dynamics. Standard long short-term memory (LSTM)-based models are widely used for sequence modeling due to its long-term memory, yet they are unable to fully model the relationship between different body joints or persons to extract crucial co-occurrence features from different levels. To handle this shortcoming, we propose an attention-based multilevel co-occurrence graph convolutional LSTM (AMCGC-LSTM). By integrating graph convolutional networks (GCNs) into LSTM, the proposed model is capable of leveraging body structural information from skeletons and strengthening the multilevel co-occurrence (MC) feature learning. Specifically, we first design the spatial attention module for feature enhancement of key joints from skeleton inputs. Second, we design MC memory units coupled with GCN to automatically model the spatial relationship between joints, and simultaneously capture the co-occurrence features from different joints, persons, and frames. Finally, we construct aggregated features of MCs (AFMCs) from MC memory units to better represent the intraframe action context encoding, and leverage a concurrent LSTM (Co-LSTM) to further model their temporal dynamics for action recognition. Our model significantly outperforms mainstream methods on NTU RGB+D 60/120 data set, mutual action subset of NTU RGB+D 60/120 data set, and Northewestern-UCLA data set.

Sequential learning for multimodal 3D human activity recognition with Long-Short Term Memory

Lstm With Uniqueness Attention For Human Activity Recognition

Semi‐supervised Long Short‐term Memory for Human Action Recognition

Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM

Deep Dilation on Multimodality Time Series for Human Activity Recognition.

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

Learning Dynamic Spatio-Temporal Relations for Human Activity Recognition.

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition

Lattice Long Short-Term Memory for Human Action Recognition

DB-LSTM: Densely-connected Bi-directional LSTM for Human Action Recognition

Context-Associative Hierarchical Memory Model for Human Activity Recognition and Prediction

3d Human Action Recognition With Skeleton Orientation Vectors And Stacked Residual Bi-Lstm

Multisource Learning for Skeleton-Based Action Recognition Using Deep LSTM and CNN

3D Human Action Recognition with Siamese-LSTM Based Deep Metric Learning

Human Activity Recognition Based on Wearable Sensor Using Hierarchical Deep LSTM Networks

Human Action Recognition Based on Selected Spatio-Temporal Features Via Bidirectional LSTM

Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition

Hierarchical Long Short-Term Concurrent Memory for Human Interaction Recognition

Human Activity Recognition based on Dynamic Spatio-Temporal Relations

Human-Robot Collaboration by Intention Recognition Using Deep LSTM Neural Network

Deep Stacked Bidirectional Lstm Neural Network For Skeleton-Based Action Recognition