Abstract:Human behaviour recognition is an important research direction in the field of computer vision, with broad application prospects in areas such as human–computer interaction, smart healthcare, video surveillance, and sports motion analysis. However, current skeleton‐based behaviour recognition methods using graph convolutional networks still face some challenges, such as the difficulty of fully utilizing the dependencies among distant nodes and distinguishing similar actions. To address the limitations of existing graph convolution‐based models in distinguishing similar actions, a multi‐stream hierarchical perception graph convolutional network model that incorporates angle features is proposed. This model introduces four new angle feature representations to capture subtle variations in different body parts, providing discriminative features to differentiate action details. Additionally, it utilizes a key angle feature enhancement module to strengthen important angle features for specific actions. The model achieves recognition accuracies of 92.8% and 96.8% under the cross‐subject and cross‐view evaluation criteria of the NTU‐RGB+D dataset, respectively, and attains accuracies of 89.2% and 90.8% under the cross‐subject and cross‐setup evaluation criteria of the NTU‐RGB+D 120 dataset. The experimental results validate that angle information effectively enhances the model's accuracy and improves its ability to distinguish similar actions. Distinguishing similar actions has been a challenging challenge in skeleton‐based action recognition. Since the joint coordinates in these actions are similar, it is difficult to accomplish the recognition task using traditional joint features. To address this issue, the use of angle features to capture subtle nuances in various body parts, along with a critical angle enhancement module that assigns weights to different angle feature representations for a given action are proposed, highlighting the critical angle feature representation. The approach is evaluated using a three‐stream ensemble method on three large action recognition datasets, NTU‐RGB+D, NTU‐RGB+D 120, and Kinetics‐400. The experimental results demonstrate that incorporating angular information can effectively complement joint and skeletal features, leading to improved recognition of similar actions and enhanced model performance and robustness.

Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features

Action Recognition Based on 3D Skeleton and RGB Frame Fusion

Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

Symmetrical Enhanced Fusion Network for Skeleton-Based Action Recognition

Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition

Skeleton Feature Fusion Based on Multi-Stream LSTM for Action Recognition.

A Skeleton-Based Assembly Action Recognition Method with Feature Fusion for Human-Robot Collaborative Assembly

Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks

3D Action Recognition Using Multi-Temporal Skeleton Visualization.

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Action Recognition Based on Global Optimal Similarity Measuring

Fusing Higher-Order Features in Graph Neural Networks for Skeleton-Based Action Recognition

Online Robust Action Recognition Based on a Hierarchical Model

Human-centric multimodal fusion network for robust action recognition

Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition.

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Multi-source Learning for Skeleton -Based Action Recognition Using Deep LSTM Networks

Skeleton Focused Human Activity Recognition in RGB Video

Skeleton-Based Action Recognition With Low-Level Features of Adaptive Graph Convolutional Networks

Fusing angular features for skeleton‐based action recognition using multi‐stream graph convolution network

Multi‐temporal scale aggregation refinement graph convolutional network for skeleton‐based action recognition