Abstract:3D skeleton data has been widely used in action recognition as the skeleton-based method has achieved good performance in complex dynamic environments. The rise of spatio-temporal graph convolutions has attracted much attention to use graph convolution to extract spatial and temporal features together in the field of skeleton-based action recognition. However, due to the huge difference in the focus of spatial and temporal features, it is difficult to improve the efficiency of extracting the spatiotemporal features. In this paper, we propose a channel attention and multi-scale neural network (CA-MSN) for skeleton-based action recognition with a series of spatio-temporal extraction modules. We exploit the relationship of body joints hierarchically through two modules, i.e., a spatial module which uses the residual GCN network with the channel attention block to extract the high-level spatial features, and a temporal module which uses the multi-scale TCN network to extract the temporal features at different scales. We perform extensive experiments on both the NTU-RGBD60 and NTU-RGBD120 datasets to verify the effectiveness of our network. The comparison results show that our method achieves the state-of-the-art performance with the competitive computing speed. In order to test the application effect of our CA-MSN model, we design a multi-task tandem network consisting of 2D pose estimation, 2D to 3D pose regression and skeleton action recognition model. The end-to-end (RGB video-to-action type) recognition effect is demonstrated. The code is available at https://github.com/Rh-Dang/CA-MSN-action-recognition.git.

Skeleton-Based Human Action Recognition Using Spatial Temporal 3D Convolutional Neural Networks

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition

Skeleton-based Action Recognition Using LSTM and CNN

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

End-to-end Learning of Deep Convolutional Neural Network for 3D Human Action Recognition

3D Action Recognition Using Data Visualization and Convolutional Neural Networks.

Skeleton-Based Square Grid for Human Action Recognition With 3D Convolutional Neural Network

A New Representation of Skeleton Sequences for 3D Action Recognition

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

3D Action Recognition Using Multi-Temporal Skeleton Visualization.

Investigation of Different Skeleton Features for CNN-based 3D Action Recognition

Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection

Part-wise Spatio-temporal Attention Driven CNN-based 3D Human Action Recognition

An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data

Temporal Enhanced Multi-Stream Graph Convolutional Nerual Networks For Skeleton-Based Action Recognition

Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks

Channel attention and multi-scale graph neural networks for skeleton-based action recognition

Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints