Complementary Feature Pyramid Network for Human Pose Estimation

Yanhao Cheng,Weibin Liu,Weiwei Xing
DOI: https://doi.org/10.1109/ijcnn52387.2021.9534038
2021-07-18
Abstract:Human pose estimation plays an important role in human action recognition, human-computer interaction, animation. Most existing methods commonly utilize cascaded pyramid or stacked hourglass network to fuse multi-scale feature from different levels, which greatly enhances the performance but brings a tremendous amount of computing, so it is difficult to achieve real-time human pose estimation. In this paper, we propose a newly-designed network named Complementary Feature Pyramid Network (CFPNet) for human pose estimation with a focus on efficient multi-scale feature generate and fusion method. CFPNet extends the range of receptive fields for each network layer with the help of Feature Mix Bottleneck (FMB) block which constructs hierarchical connections and mixes multiple receptive fields features in a single bottleneck block. In order to reduce the redundant gradient information during the network optimization and construct a lightweight network, Cross Stage Partial (CSP) connection is introduced into the CFPNet. Complementary Feature Fusion (CFF) block is proposed, which can adaptively select complementary information from different levels for fusion to maximize the effective feature in the output of CFPNet. Through the above improvements, CFPNet comprises more affluent multi-scale feature and lower model complexity. Especially, CFPNet-101 achieves the 72.3% AP at 31.7 FPS on the MS COCO dataset only with 1.96 GFLOPs and 10.5M Params. Compared with the existing methods, CFPNet has competitive accuracy and can run in real-time.
What problem does this paper attempt to address?