DSPNet: A low computational-cost network for human pose estimation

Fujin Zhong,Mingyang Li,Kun Zhang,Jun Hu,Li Liu

DOI: https://doi.org/10.1016/j.neucom.2020.11.003

IF: 6

2021-01-01

Neurocomputing

Abstract:<p>Existing human pose estimation methods usually have a high computational load, which is very unfavorable for resource-limited equipment. To address this issue, we propose a low computational-cost deep supervision pyramid network called DSPNet. Firstly, we design a lightweight up-sampling unit instead of transposed convolution as a decoder for the network. In the case of decreased computation, it has brought an increase in prediction accuracy. Secondly, we present a novel deep supervision pyramid architecture to improve the multi-scale obtaining ability of MSRA SimpleBaseline while not bringing any increase in the number of parameters. The experimental results on both MPII and COCO pose estimation benchmarks illustrate that DSPNet achieves almost equivalent state-of-the-art results with a low computational load. Especially, the computational cost of DSPNet is 12.7% of SimpleBaseline and the estimation accuracy is improved by 0.9 points when both methods use the same backbone network (EfficientNet) on MPII validation set.</p>

computer science, artificial intelligence

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the issue of high computational cost in human pose estimation methods on resource-constrained devices. Specifically, existing human pose estimation methods typically require a large amount of computational resources, which is detrimental to resource-constrained devices such as smartphones. To solve this problem, the authors propose a low-computation-cost Deep Supervision Pyramid Network (DSPNet). #### Main Contributions: 1. **Lightweight Upsampling Unit (LUSU)**: A lightweight upsampling unit is designed, combining separable transposed convolution, channel attention mechanism, and lightweight self-attention mechanism. This design can maintain high estimation accuracy while reducing the number of parameters. 2. **Deep Supervision Pyramid Architecture (DSP)**: A novel deep supervision pyramid architecture is proposed, introducing multi-scale supervision and a coarse-to-fine refinement process into a single-stage network. By sharing weights, the ability to acquire multi-scale information is retained during training, while only a single-branch structure is used during inference. #### Experimental Results: Experiments show that DSPNet achieves results comparable to existing state-of-the-art methods on the MPII and COCO pose estimation benchmarks, but with significantly reduced computational cost. For example, on the MPII validation set, when using EfficientNet as the backbone network, DSPNet's computational cost is only 12.7% of SimpleBaseline's, while the estimation accuracy is improved by 0.9 points.

DSPNet: A low computational-cost network for human pose estimation

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

Context-Guided Adaptive Network for Efficient Human Pose Estimation.

LSDNet: lightweight stochastic depth network for human pose estimation

Implicit Decouple Network for Efficient Pose Estimation

Simple and Lightweight Human Pose Estimation

Channel sifted model for pose estimation

Complementary Feature Pyramid Network for Human Pose Estimation

Human Pose Estimation Based on Efficient and Lightweight High-Resolution Network (EL-HRNet)

A Deconvolutional Bottom-up Deep Network for Multi-Person Pose Estimation.

SLBNet: Shallow and Lightweight Bilateral Network for Pose Estimation

Ghost attentional down net: An effective lightweight top-down network for human pose estimation

A Lightweight Network Based on Pyramid Residual Module for Human Pose Estimation

Optimized S2E Attention Block based Convolutional Network for Human Pose Estimation

An improved lightweight high-resolution network based on multi-dimensional weighting for human pose estimation

Lightweight high-resolution network based on adaptive cross-dimensional weighting for human pose estimation

Towards Simple and Accurate Human Pose Estimation with Stair Network

EfficientPose: Scalable single-person pose estimation

Cascaded Pyramid Network for Multi-Person Pose Estimation

Deep Layer and Spatial Aggregation neural network for human pose estimation

SPCNet:Spatial Preserve and Content-aware Network for Human Pose Estimation