Abstract:3D skeleton-based action recognition (3D SAR) has gained significant attention within the computer vision community, owing to the inherent advantages offered by skeleton data. As a result, a plethora of impressive works, including those based on conventional handcrafted features and learned feature extraction methods, have been conducted over the years. However, prior surveys on action recognition have primarily focused on video or RGB data-dominated approaches, with limited coverage of reviews related to skeleton data. Furthermore, despite the extensive application of deep learning methods in this field, there has been a notable absence of research that provides an introductory or comprehensive review from the perspective of deep learning architectures. To address these limitations, this survey first underscores the importance of action recognition and emphasizes the significance of 3D skeleton data as a valuable modality. Subsequently, we provide a comprehensive introduction to mainstream action recognition techniques based on four fundamental deep architectures, i.e., Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Graph Convolutional Network (GCN), and Transformers. All methods with the corresponding architectures are then presented in a data-driven manner with detailed discussion. Finally, we offer insights into the current largest 3D skeleton dataset, NTU-RGB+D, and its new edition, NTU-RGB+D 120, along with an overview of several top-performing algorithms on these datasets. To the best of our knowledge, this research represents the first comprehensive discussion of deep learning-based action recognition using 3D skeleton data.

3D Action Recognition Using Multi-Temporal Skeleton Visualization.

3D Action Recognition Using Data Visualization and Convolutional Neural Networks.

Skeleton-Based Human Action Recognition Using Spatial Temporal 3D Convolutional Neural Networks

Action Recognition Based on Global Optimal Similarity Measuring

A New Representation of Skeleton Sequences for 3D Action Recognition

Shifting Perspective to See Difference: A Novel Multi-View Method for Skeleton Based Action Recognition

Skeleton-based Action Recognition Using LSTM and CNN

Enhanced Skeleton Visualization for View Invariant Human Action Recognition.

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Multiple temporal scale aggregation graph convolutional network for skeleton-based action recognition

Accurate And Real-Time Human Action Recognition Based On 3d Skeleton

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

A Survey on 3D Skeleton-Based Action Recognition Using Learning Method

Spectral studies on metal-ligand bonding of novel rhodanine azodye sulphadrugs.

Deep learning-based multi-view 3D-human action recognition using skeleton and depth data

Spatio–Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

Dual-Excitation Spatial–Temporal Graph Convolution Network for Skeleton-Based Action Recognition