Learning high resolution reservation for human pose estimation

Bingkun Gao,Ke Ma,Hongbo Bi,Ling Wang,Chenlei Wu
DOI: https://doi.org/10.1007/s11042-021-10731-4
IF: 2.577
2021-01-01
Multimedia Tools and Applications
Abstract:The human pose estimation in images and videos is a challenging task in many applications. Most of the network structures used to estimate the pose only use the convolution feature of the last layer, which will cause the loss of information. In this paper, we propose a multi-scales fusion framework based on the hourglass network for the human pose estimation, which can effectively obtain sufficient information of different resolutions. In the process of extracting different resolution features, the network constantly complements the high resolution features. Additionally, we design the depth pyramid residual module to fuse different various scales features. The whole network is stacked by sub-networks. For applying in limited storage space better, we only use 2-stage stacked network. We test the network on standard benchmarks MPII dataset, our method achieves 88.9% PCKh score and improves the PCK score by 0.7%, compared with the original network. Our approach gains state-of-the-art results.
What problem does this paper attempt to address?