Abstract:Action recognition based on 3D skeleton sequences has gained considerable attention in recent years. Due to effectively representing the spatial and the temporal characters of skeleton sequences, the Covariance Matrix (CM) features combined with the Long Short-Term Memory (LSTM) network is an effective and reasonable roadmap to enhance the action recognition accuracy. However, the CM features in the existing recognition models are computed from the raw data without normalization or with static normalization. Moreover, a CM feature is calculated from all coordinates in one frame, treating all coordinates in three axes identically and neglecting the relationship of the coordinates in the same axe. In this paper, an end to end deep learning framework is proposed that includes a normalization layer dynamically adapting to data distribution and inference procedure. After normalization, the three covariance feature sequences from the coordinates in three axes are produced from the sliding windows and are fused into one fusion matrix using a convolution layer. Finally, the fusion matrix is sequentially fed into an LSTM network to recognize skeleton action. The novelty of the proposed framework is combining the adaptive preprocessing and the features fusion to the LSTM network and improving the recognition accuracy by optimizing the quality of the features rather than network construction. In the experiments, the proposed framework is verified on the public datasets and one student action dataset collected from a real classroom. The experimental results demonstrate that the proposed method achieves a significant improvement in accuracy compared to the state-of-the-art methods. It can be concluded that the proposed framework can not only accurately capture the correlation of joints in the same frame but can also effectively express the dependences of sequential frames.

Optimizing Features Quality: A Normalized Covariance Fusion Framework for Skeleton Action Recognition.

Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks

Hybrid Features for Skeleton-Based Action Recognition Based on Network Fusion.

Joints-Centered Spatial-Temporal Features Fused Skeleton Convolution Network for Action Recognition

Full-Dimensional Optimizable Network: A Channel, Frame and Joint-Specific Network Modeling for Skeleton-Based Action Recognition

Skeleton Feature Fusion Based on Multi-Stream LSTM for Action Recognition.

Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition

Symmetrical Enhanced Fusion Network for Skeleton-Based Action Recognition

Combine Multi-order Representation Learning and Frame Optimization Learning for Skeleton-based Action Recognition

Human Skeleton Feature Optimizer and Adaptive Structure Enhancement Graph Convolution Network for Action Recognition

Action Recognition Based on 3D Skeleton and RGB Frame Fusion

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Adaptive Spatiotemporal Graph Convolutional Network with Intermediate Aggregation of Multi-Stream Skeleton Features for Action Recognition

Fusing Higher-Order Features in Graph Neural Networks for Skeleton-Based Action Recognition

Skeleton-based Action Recognition with Multi-Stream, Multi-Scale Dilated Spatial-Temporal Graph Convolution Network

Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition

Fusing Shape and Motion Matrices for View Invariant Action Recognition Using 3D Skeletons

Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features

Skeleton-based Human Action Recognition by Fusing Attention Based Three-Stream Convolutional Neural Network and SVM.

Global Co-occurrence Feature Learning and Active Coordinate System Conversion for Skeleton-based Action Recognition