MSPENet: Multi-Scale Adaptive Fusion and Position Enhancement Network for Human Pose Estimation

Xu, Jia,Liu, Weibin,Xing, Weiwei,Wei, Xiang
DOI: https://doi.org/10.1007/s00371-022-02460-y
2023-01-01
Abstract:Human pose estimation is a fundamental yet challenging task in computer vision. Recently, with the involvement of deep neural networks, human pose estimation has made great progresses. However, existing pose estimation networks still have some difficulties in detecting small-scale keypoints and distinguishing semantic confusion keypoints. In this paper, a novel convolutional neural network named multi-scale position enhancement network is proposed to address the above two problems. First, a multi-scale adaptive fusion unit is proposed to dynamically choose and fuse features on different scales, allowing small-scale keypoints to obtain more detailed information that is beneficial for detection. Second, we discover that although appearance-similar parts are difficult to distinguish in semantics, they differ significantly in spatial location. Therefore, a position enhancement module is designed to highlight features of real joint locations while learning more discriminative features to suppress features of similar joint regions. Finally, a global context block is applied to optimize the prediction results in order to further improve the network performance. Experiments on both single- and multi-person pose estimation benchmarks illustrate that our approach yields more accurate and reliable results.
What problem does this paper attempt to address?