ASHN for Multi-Human Pose Estimation

Pan Gao,Zhuhua Hu
DOI: https://doi.org/10.1109/acait56212.2022.10137930
2022-01-01
Abstract:Due to the diversity of human body posture, there are problems such as occlusion of key points, difference of target scale and background blur among people. Therefore, multi-human pose estimation is still a challenging task. The existing deep learning-based multi-body pose estimation methods are mainly divided into top-down and bottom-up, but most of them do not make full use of local features in the network. In this paper, convolutional block attention module(CBAM) and Focal L2 Loss were used to process the context information of convolutional neural network and consolidate local features. Specifically, we propose attention-containing stacked hourglass network (ASHN). ASHN is based on a stacked hourglass network, with the addition of a convolutional block attention module (CBAM) module to improve performance, combined with Focal L2 Loss in the model. Compared with the existing methods, our method achieves competitive performance, achieving 66.8% AP, 72.1% AP75 and 65.4% APM on COCO data sets.
What problem does this paper attempt to address?