Abstract:Human pose estimation is one of the most critical and challenging problems in computer vision. It is applied in many computer vision fields and has important research significance. However, it is still a difficult challenge to strike a balance between the number of parameters and computing load of the model and the accuracy of human pose estimation. In this study, we suggest a Lightweight Cross-scale Feature Fusion Network (LCFFNet) to strike a balance between accuracy and computational load and parameter volume.The Lightweight HRNet-Like (LHRNet) network, Cross-Resolution-Aware Semantics Module (CRASM), and Adapt Feature Fusion Module (AFFM) make up LCFFNet. To be more precise, first, we suggest a lightweight LHRNet network that includes Dynamic Multi-scale Convolution Basic (DMSC-Basic block) block, Basic block, and DMSC-Basic block submodules in the network’s three high-resolution subnetwork stages. The proposed dynamic multi-scale convolution in DMSC-Basic block can reduces the amount of model parameters and complexity of the LHRNet network, and has the ability to extract variable pose features. In order to maintain the model’s ability to express features, the Basic block is introduced. As a result, the LHRNet network not only makes the model more lightweight but also enhances its feature expression capabilities. Second, we propose a CRASM module to enhance contextual semantic information while reducing the semantic gap between different scales by fusing features from different scales. Finally, the augmented semantic feature map’s spatial resolution is finally restored from bottom to top using our suggested AFFM, and adaptive feature fusion is used to increase the positioning accuracy of important sites. Our method successfully predicts keypoints with 74.2 % AP, 89.9 % PCKh@0.5 and 66.9 % AP on the MSCOCO 2017, MPII and Crowdpose datasets, respectively. Our model reduces the number of parameters by 89.0 % and the computational complexity by 87.5 % compared with HRNet. The proposed network performs as well as current large-model human pose estimation networks while outperforming state-of the-art lightweight networks.

Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection

CFENet: Content-aware Feature Enhancement Network for Multi-Person Pose Estimation

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

Adaptively Fusing Complete Multi-resolution Features for Human Pose Estimation.

Human Pose Estimation Based on Feature Enhancement and Multi-Scale Feature Fusion

EANet: Towards Lightweight Human Pose Estimation With Effective Aggregation Network

Full Scale-Aware Balanced High-Resolution Network for Multi-Person Pose Estimation

MSPENet: Multi-Scale Adaptive Fusion and Position Enhancement Network for Human Pose Estimation

Improving Human Pose Estimation Based on Stacked Hourglass Network

A Multi-stage Feature Fusion Network for Human Pose Estimation

A Compact and Powerful Single-Stage Network for Multi-Person Pose Estimation

Multi Hybrid Extractor Network for 3D Human Pose Estimation

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention

Enhancement and Optimisation of Human Pose Estimation with Multi-Scale Spatial Attention and Adversarial Data Augmentation

A Multi-Level Network for Human Pose Estimation

AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose Regression

LCFFNet: A Lightweight Cross-scale Feature Fusion Network for Human Pose Estimation

InsPose: Instance-Aware Networks for Single-Stage Multi-Person Pose Estimation

Multi-Scale Structure-Aware Network for Human Pose Estimation

Complementary Feature Pyramid Network for Human Pose Estimation

Adaptive Multi-Path Aggregation for Human DensePose Estimation in the Wild