Abstract:The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a significant role in advancing the state-of-the-art in various vision-based action recognition systems. Recently, the introduction of residual connections in conjunction with a more traditional CNN model in a single architecture called Residual Network (ResNet) has shown impressive performance and great potential for image recognition tasks. In this paper, we investigate and apply deep ResNets for human action recognition using skeletal data provided by depth sensors. Firstly, the 3D coordinates of the human body joints carried in skeleton sequences are transformed into image-based representations and stored as RGB images. These color images are able to capture the spatial-temporal evolutions of 3D motions from skeleton sequences and can be efficiently learned by D-CNNs. We then propose a novel deep learning architecture based on ResNets to learn features from obtained color-based representations and classify them into action classes. The proposed method is evaluated on three challenging benchmark datasets including MSR Action 3D, KARD, and NTU-RGB+D datasets. Experimental results demonstrate that our method achieves state-of-the-art performance for all these benchmarks whilst requiring less computation resource. In particular, the proposed method surpasses previous approaches by a significant margin of 3.4% on MSR Action 3D dataset, 0.67% on KARD dataset, and 2.5% on NTU-RGB+D dataset.

Deep Stacked Bidirectional Lstm Neural Network For Skeleton-Based Action Recognition

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network

Explorations of Skeleton Features for LSTM-based Action Recognition

Spatio-temporal stacking model for skeleton-based action recognition

Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

DB-LSTM: Densely-connected Bi-directional LSTM for Human Action Recognition

Skeleton-based Action Recognition Using LSTM and CNN

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

Spatio-Temporal Attention Deep Network for Skeleton Based View-Invariant Human Action Recognition

A New Representation of Skeleton Sequences for 3D Action Recognition

Exploiting deep residual networks for human action recognition from skeletal data

Temporal Enhanced Multi-Stream Graph Convolutional Nerual Networks For Skeleton-Based Action Recognition

Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action Recognition

Skeleton-Based Action Recognition With Directed Graph Neural Networks

Spatial Temporal Transformer Network for Skeleton-based Action Recognition

Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition