Human Pose Estimation Based on Feature Enhancement and Multi-Scale Feature Fusion

Dandan Cao,Weibin Liu,Weiwei Xing,Xiang Wei
DOI: https://doi.org/10.1007/s11760-022-02271-7
IF: 1.583
2022-01-01
Signal Image and Video Processing
Abstract:The human pose estimation has been greatly improved with the development of deep neural network. However, there are some challenges in this task, such as the occlusions in images and various scales of the human body. In this study, we propose a novel convolutional neural network architecture based on dual attention mechanism and multi-scale feature fusion to generate keypoints prediction and estimate the location of human body parts in images. Firstly, the feature enhancement module(FEM) performs local feature enhancement process for each feature map of the network using the double-attention mechanism, where channel attention is used to filter out the channels that need more attention and spatial attention is used to enhance the local features of each feature map at the spatial level. Secondly, we design a multi-scale feature fusion(MSFF) module by using the cascade of atrous convolution to aggregate contextual information and enhance the expressiveness of features. The multi-scale contextual information is increased by expanding the perceptual field, which helps to detect adjacent keypoints. Finally, we introduce an improved upsampling module that jointly uses upsampling2D and transposed convolution to better regress the obtained feature maps to higher resolution and output heatmaps. Extensive experiments on MPII and COCO human pose estimation benchmarks demonstrate the effectiveness of our network.
What problem does this paper attempt to address?