Abstract:(1) A novel 2D human skeleton action recognition model with spatial constraints, named 2D‐SCHAR, is introduced to address the ambiguity and uncertainty associated with human action recognition in 2D surveillance videos. (2) These issues stem from the absence of depth information in the action data, thus we concentrate on two main challenges: depth estimation and spatial transformation, to enhance recognition accuracy. (3) The depth estimation component aims to reconstruct 3D action data from 2D inputs, while the spatial transformation employs spatial constraints to adjust and rectify the 3D action data. Human actions are predominantly presented in 2D format in video surveillance scenarios, which hinders the accurate determination of action details not apparent in 2D data. Depth estimation can aid human action recognition tasks, enhancing accuracy with neural networks. However, reliance on images for depth estimation requires extensive computational resources and cannot utilise the connectivity between human body structures. Besides, the depth information may not accurately reflect actual depth ranges, necessitating improved reliability. Therefore, a 2D human skeleton action recognition method with spatial constraints (2D‐SCHAR) is introduced. 2D‐SCHAR employs graph convolution networks to process graph‐structured human action skeleton data comprising three parts: depth estimation, spatial transformation, and action recognition. The initial two components, which infer 3D information from 2D human skeleton actions and generate spatial transformation parameters to correct abnormal deviations in action data, support the latter in the model to enhance the accuracy of action recognition. The model is designed in an end‐to‐end, multitasking manner, allowing parameter sharing among these three components to boost performance. The experimental results validate the model's effectiveness and superiority in human skeleton action recognition.

Learning Composite Latent Structures for 3D Human Action Representation and Recognition

Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition

Action Recognition Based on Global Optimal Similarity Measuring

Compositional Structure Learning for Action Understanding

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition

Online Robust Action Recognition Based on a Hierarchical Model

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

Learning Discriminative Activated Simplices for Action Recognition

Latent Semantic Learning with Structured Sparse Representation for Human Action Recognition

Effective Active Skeleton Representation for Low Latency Human Action Recognition

A Novel 3D Human Action Recognition Framework for Video Content Analysis.

Deep set conditioned latent representations for action recognition

2D human skeleton action recognition with spatial constraints

Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

Recognizing Actions In 3d Using Action-Snippets And Activated Simplices

Part-wise Spatio-temporal Attention Driven CNN-based 3D Human Action Recognition

Spatio-temporal attention on manifold space for 3D human action recognition

A Novel Hierarchical Framework for Human Action Recognition