Spatio-temporal neural network with handcrafted features for skeleton-based action recognition

Mihai Nan,Mihai Trăscău,Adina-Magda Florea
DOI: https://doi.org/10.1007/s00521-024-09559-4
2024-02-24
Neural Computing and Applications
Abstract:Abstract The task of human action recognition (HAR) can be found in many computer vision practical applications. Various data modalities have been considered for solving this task, including joint-based skeletal representations which are suitable for real-time applications on platforms with limited computational resources. We propose a spatio-temporal neural network that uses handcrafted geometric features to classify human actions from video data. The proposed deep neural network architecture combines graph convolutional and temporal convolutional layers. The experiments performed on public HAR datasets show that our model obtains results similar to other state-of-the-art methods but has a lower inference time while offering the possibility to obtain an explanation for the classified action.
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **Human Activity Recognition (HAR)**, especially achieving efficient and real - time activity recognition on platforms with limited computational resources. Specifically, the author proposes a spatio - temporal neural network model based on skeleton data, aiming to improve the performance of activity classification through hand - designed geometric features, while reducing the inference time and providing interpretability. ### Main problems and goals of the paper 1. **Efficient and real - time activity recognition**: Although existing deep - learning methods perform well in terms of accuracy, they are difficult to be directly deployed on platforms with limited computational resources because these models usually have a large number of parameters. Therefore, the goal of this paper is to develop an activity recognition system that can operate efficiently in resource - constrained environments. 2. **Reducing the dependence on training data**: Many existing methods are very sensitive to training data and cannot generalize well to unseen data. The method proposed in this paper aims to improve the generalization ability of the model so that it can handle new data different from the training data. 3. **Providing interpretability**: Most deep - learning models are black - box models, and it is difficult to explain their decision - making processes. The model proposed in this paper can not only perform activity classification but also output feature tensors, thus providing an explanation for the classification results. ### Specific methods and techniques To achieve the above goals, the author proposes the following key techniques: - **Hand - designed geometric features**: By introducing geometric features such as joint velocities, bone lengths and angles, the robustness and adaptability of the model are enhanced. These features are designed to be invariant to an individual's body properties and movement speeds, thereby improving the generalization ability of the model. - **Spatio - temporal neural network architecture**: It combines Graph Convolutional Layers (GCN) and Temporal Convolutional Layers (TCN) to capture spatial and temporal information simultaneously. This architecture can achieve performance comparable to other state - of - the - art methods while maintaining a low inference time. - **Non - black - box model**: By outputting feature tensors, the model can explain its prediction results, providing an understanding of the classification decision - making process. ### Experimental verification The author conducted experiments on the NTU RGB + D dataset to verify the effectiveness of the proposed method. The experimental results show that the model achieved results comparable to existing methods under multiple test protocols, but with a shorter inference time and the ability to provide explanations. ### Summary This paper solves the application problems of existing activity recognition methods on platforms with limited computational resources by introducing hand - designed geometric features and an improved spatio - temporal neural network architecture, and improves the generalization ability and interpretability of the model.