Action Recognition Based on Fusion Skeleton of Two Kinect Sensors

Yujian Jiang,Kai Song,Jia Wang
DOI: https://doi.org/10.1109/iccst50977.2020.00052
2020-01-01
Abstract:Along with the development of human-computer interaction, action recognition has become an aspect of computer vision. In recent years, skeleton-based action recognition has become a research hotspot in the field of computer vision. The human skeleton can be obtained through the Kinect sensor, but the single Kinect sensor is often affected by self-occlusion so that it is impossible to accurately obtain information on all skeleton joints of the human. In this paper, a data fusion method is proposed. The two Kinect sensors are placed in a fixed space, and they are orthogonal to each other. They extract the human skeleton from different perspectives. Then, the two skeletons were fused to obtain accurately skeleton and eliminate the self-occlusion effect. Finally, the Spatial Temporal Graph Convolutional Networks (ST-GCN) be able to process skeleton-based action recognition by learning the spatial and temporal patterns of data automatically, then to achieve the goal of accurate recognition of human body posture. It is used to verify the validity of the data fusion method in this paper. Particularly, the skeleton under two cameras of the large dataset NTU-RGB+D is fused to be used as the dataset of this experiment. Experimental results show that the data fusion method is reasonable and effective.
What problem does this paper attempt to address?