Human Skeleton Action Recognition Based on Monocular Depth Estimation

Lei Wang,Jianwei Zhang,Shuhang Gu,Yu Jing,Yang Wei
DOI: https://doi.org/10.21203/rs.3.rs-3019986/v1
2023-01-01
Abstract:Abstract Human action recognition exhibits limited accuracy in video surveillance due to the 2D information captured with the monocular cameras. To address the problem, a monocular depth estimation (SARDE)-based human skeleton action recognition method is proposed in this study. The proposed method comprises two tasks, i.e., human skeleton action recognition and monocular depth estimation, with the aim of transforming 2D human action into 3D. The two tasks are integrated with multi-task manner in an end-to-end training to fully utilize the correlation between action 1 recognition and depth estimation by sharing learning data to learn the depth features of the human skeleton joints more effectively for human action recognition. In this study, the graph-structured depth estimations methods with inception blocks and skip connections are investigated. The experimental results verify the effectiveness and superiority of the proposed model in skeleton action recognition, the model reaches state-of-the-art on the datasets.
What problem does this paper attempt to address?