Abstract:3D skeleton-based action recognition (3D SAR) has gained significant attention within the computer vision community, owing to the inherent advantages offered by skeleton data. As a result, a plethora of impressive works, including those based on conventional handcrafted features and learned feature extraction methods, have been conducted over the years. However, prior surveys on action recognition have primarily focused on video or RGB data-dominated approaches, with limited coverage of reviews related to skeleton data. Furthermore, despite the extensive application of deep learning methods in this field, there has been a notable absence of research that provides an introductory or comprehensive review from the perspective of deep learning architectures. To address these limitations, this survey first underscores the importance of action recognition and emphasizes the significance of 3D skeleton data as a valuable modality. Subsequently, we provide a comprehensive introduction to mainstream action recognition techniques based on four fundamental deep architectures, i.e., Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Graph Convolutional Network (GCN), and Transformers. All methods with the corresponding architectures are then presented in a data-driven manner with detailed discussion. Finally, we offer insights into the current largest 3D skeleton dataset, NTU-RGB+D, and its new edition, NTU-RGB+D 120, along with an overview of several top-performing algorithms on these datasets. To the best of our knowledge, this research represents the first comprehensive discussion of deep learning-based action recognition using 3D skeleton data.

End-to-end Learning of Deep Convolutional Neural Network for 3D Human Action Recognition

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

Skeleton-Based Human Action Recognition Using Spatial Temporal 3D Convolutional Neural Networks

Skeleton-based Action Recognition Using LSTM and CNN

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Exploiting deep residual networks for human action recognition from skeletal data

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

3D Action Recognition Using Data Visualization and Convolutional Neural Networks.

Human Action Recognition Using Deep Learning Methods.

3D Convolutional Neural Network for Action Recognition.

Skeleton-based Human Action Recognition via Convolutional Neural Networks (CNN)

A Survey on 3D Skeleton-Based Action Recognition Using Learning Method

Deep learning-based multi-view 3D-human action recognition using skeleton and depth data

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

Investigation of Different Skeleton Features for CNN-based 3D Action Recognition

Skeleton-Based Square Grid for Human Action Recognition With 3D Convolutional Neural Network

Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences

Part-wise Spatio-temporal Attention Driven CNN-based 3D Human Action Recognition

An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data