Optimizing Features Quality: A Normalized Covariance Fusion Framework for Skeleton Action Recognition.

Guan Huang,Qiuyan Yan
DOI: https://doi.org/10.1109/access.2020.3037238
IF: 3.9
2020-01-01
IEEE Access
Abstract:Action recognition based on 3D skeleton sequences has gained considerable attention in recent years. Due to effectively representing the spatial and the temporal characters of skeleton sequences, the Covariance Matrix (CM) features combined with the Long Short-Term Memory (LSTM) network is an effective and reasonable roadmap to enhance the action recognition accuracy. However, the CM features in the existing recognition models are computed from the raw data without normalization or with static normalization. Moreover, a CM feature is calculated from all coordinates in one frame, treating all coordinates in three axes identically and neglecting the relationship of the coordinates in the same axe. In this paper, an end to end deep learning framework is proposed that includes a normalization layer dynamically adapting to data distribution and inference procedure. After normalization, the three covariance feature sequences from the coordinates in three axes are produced from the sliding windows and are fused into one fusion matrix using a convolution layer. Finally, the fusion matrix is sequentially fed into an LSTM network to recognize skeleton action. The novelty of the proposed framework is combining the adaptive preprocessing and the features fusion to the LSTM network and improving the recognition accuracy by optimizing the quality of the features rather than network construction. In the experiments, the proposed framework is verified on the public datasets and one student action dataset collected from a real classroom. The experimental results demonstrate that the proposed method achieves a significant improvement in accuracy compared to the state-of-the-art methods. It can be concluded that the proposed framework can not only accurately capture the correlation of joints in the same frame but can also effectively express the dependences of sequential frames.
What problem does this paper attempt to address?