Abstract:Abstract The task of human action recognition (HAR) can be found in many computer vision practical applications. Various data modalities have been considered for solving this task, including joint-based skeletal representations which are suitable for real-time applications on platforms with limited computational resources. We propose a spatio-temporal neural network that uses handcrafted geometric features to classify human actions from video data. The proposed deep neural network architecture combines graph convolutional and temporal convolutional layers. The experiments performed on public HAR datasets show that our model obtains results similar to other state-of-the-art methods but has a lower inference time while offering the possibility to obtain an explanation for the classified action.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **Human Activity Recognition (HAR)**, especially achieving efficient and real - time activity recognition on platforms with limited computational resources. Specifically, the author proposes a spatio - temporal neural network model based on skeleton data, aiming to improve the performance of activity classification through hand - designed geometric features, while reducing the inference time and providing interpretability. ### Main problems and goals of the paper 1. **Efficient and real - time activity recognition**: Although existing deep - learning methods perform well in terms of accuracy, they are difficult to be directly deployed on platforms with limited computational resources because these models usually have a large number of parameters. Therefore, the goal of this paper is to develop an activity recognition system that can operate efficiently in resource - constrained environments. 2. **Reducing the dependence on training data**: Many existing methods are very sensitive to training data and cannot generalize well to unseen data. The method proposed in this paper aims to improve the generalization ability of the model so that it can handle new data different from the training data. 3. **Providing interpretability**: Most deep - learning models are black - box models, and it is difficult to explain their decision - making processes. The model proposed in this paper can not only perform activity classification but also output feature tensors, thus providing an explanation for the classification results. ### Specific methods and techniques To achieve the above goals, the author proposes the following key techniques: - **Hand - designed geometric features**: By introducing geometric features such as joint velocities, bone lengths and angles, the robustness and adaptability of the model are enhanced. These features are designed to be invariant to an individual's body properties and movement speeds, thereby improving the generalization ability of the model. - **Spatio - temporal neural network architecture**: It combines Graph Convolutional Layers (GCN) and Temporal Convolutional Layers (TCN) to capture spatial and temporal information simultaneously. This architecture can achieve performance comparable to other state - of - the - art methods while maintaining a low inference time. - **Non - black - box model**: By outputting feature tensors, the model can explain its prediction results, providing an understanding of the classification decision - making process. ### Experimental verification The author conducted experiments on the NTU RGB + D dataset to verify the effectiveness of the proposed method. The experimental results show that the model achieved results comparable to existing methods under multiple test protocols, but with a shorter inference time and the ability to provide explanations. ### Summary This paper solves the application problems of existing activity recognition methods on platforms with limited computational resources by introducing hand - designed geometric features and an improved spatio - temporal neural network architecture, and improves the generalization ability and interpretability of the model.

Spatio-temporal neural network with handcrafted features for skeleton-based action recognition

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition

Spatio-Temporal Attention Deep Network for Skeleton Based View-Invariant Human Action Recognition

Human Action Recognition of Spatiotemporal Parameters for Skeleton Sequences Using MTLN Feature Learning Framework

An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data

Spatial Temporal Transformer Network for Skeleton-based Action Recognition

Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition

Skeleton-Based Human Action Recognition Using Spatial Temporal 3D Convolutional Neural Networks

Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network

Joints-Centered Spatial-Temporal Features Fused Skeleton Convolution Network for Action Recognition

Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition.

Multiple temporal scale aggregation graph convolutional network for skeleton-based action recognition

Skeleton-based Attention-Aware Spatial-Temporal Model for Action Detection and Recognition.

Deep learning-based multi-view 3D-human action recognition using skeleton and depth data

An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

Dynamic Semantic-Based Spatial-Temporal Graph Convolution Network for Skeleton-Based Human Action Recognition

Spatio–Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks

Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution

Dynamic Edge Convolutional Neural Network for Skeleton-Based Human Action Recognition

Temporal Enhanced Multi-Stream Graph Convolutional Nerual Networks For Skeleton-Based Action Recognition