Abstract:(1) A novel 2D human skeleton action recognition model with spatial constraints, named 2D‐SCHAR, is introduced to address the ambiguity and uncertainty associated with human action recognition in 2D surveillance videos. (2) These issues stem from the absence of depth information in the action data, thus we concentrate on two main challenges: depth estimation and spatial transformation, to enhance recognition accuracy. (3) The depth estimation component aims to reconstruct 3D action data from 2D inputs, while the spatial transformation employs spatial constraints to adjust and rectify the 3D action data. Human actions are predominantly presented in 2D format in video surveillance scenarios, which hinders the accurate determination of action details not apparent in 2D data. Depth estimation can aid human action recognition tasks, enhancing accuracy with neural networks. However, reliance on images for depth estimation requires extensive computational resources and cannot utilise the connectivity between human body structures. Besides, the depth information may not accurately reflect actual depth ranges, necessitating improved reliability. Therefore, a 2D human skeleton action recognition method with spatial constraints (2D‐SCHAR) is introduced. 2D‐SCHAR employs graph convolution networks to process graph‐structured human action skeleton data comprising three parts: depth estimation, spatial transformation, and action recognition. The initial two components, which infer 3D information from 2D human skeleton actions and generate spatial transformation parameters to correct abnormal deviations in action data, support the latter in the model to enhance the accuracy of action recognition. The model is designed in an end‐to‐end, multitasking manner, allowing parameter sharing among these three components to boost performance. The experimental results validate the model's effectiveness and superiority in human skeleton action recognition.

VW-SC3D: A Sparse 3D CNN-Based Spatial–Temporal Network with View Weighting for Skeleton-Based Action Recognition

Skeleton-Based Human Action Recognition Using Spatial Temporal 3D Convolutional Neural Networks

A View-invariant Skeleton Map with 3DCNN for Action Recognition

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Shifting Perspective to See Difference: A Novel Multi-View Method for Skeleton Based Action Recognition

Spatio-Temporal Attention Deep Network for Skeleton Based View-Invariant Human Action Recognition

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

A Novel View Attention Network for Skeleton Based Human Action Recognition*

Skeleton-based Action Recognition Using LSTM and CNN

View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition

Skeleton-based Attention-Aware Spatial-Temporal Model for Action Detection and Recognition.

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition.

Exploring a Rich Spatial-Temporal Dependent Relational Model for Skeleton-Based Action Recognition by Bidirectional LSTM-CNN.

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition

Skeleton-Based Square Grid for Human Action Recognition With 3D Convolutional Neural Network

Lightweight Multi-Scale Spatiotemporal Graph Convolutional Network for Skeleton-Based Action Recognition

3D Action Recognition Using Multi-Temporal Skeleton Visualization.

Multisource Learning for Skeleton-Based Action Recognition Using Deep LSTM and CNN

Channel attention and multi-scale graph neural networks for skeleton-based action recognition

2D human skeleton action recognition with spatial constraints

3D Action Recognition Using Data Visualization and Convolutional Neural Networks.