DSPNet: A low computational-cost network for human pose estimation

Fujin Zhong,Mingyang Li,Kun Zhang,Jun Hu,Li Liu
DOI: https://doi.org/10.1016/j.neucom.2020.11.003
IF: 6
2021-01-01
Neurocomputing
Abstract:<p>Existing human pose estimation methods usually have a high computational load, which is very unfavorable for resource-limited equipment. To address this issue, we propose a low computational-cost deep supervision pyramid network called DSPNet. Firstly, we design a lightweight up-sampling unit instead of transposed convolution as a decoder for the network. In the case of decreased computation, it has brought an increase in prediction accuracy. Secondly, we present a novel deep supervision pyramid architecture to improve the multi-scale obtaining ability of MSRA SimpleBaseline while not bringing any increase in the number of parameters. The experimental results on both MPII and COCO pose estimation benchmarks illustrate that DSPNet achieves almost equivalent state-of-the-art results with a low computational load. Especially, the computational cost of DSPNet is 12.7% of SimpleBaseline and the estimation accuracy is improved by 0.9 points when both methods use the same backbone network (EfficientNet) on MPII validation set.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the issue of high computational cost in human pose estimation methods on resource-constrained devices. Specifically, existing human pose estimation methods typically require a large amount of computational resources, which is detrimental to resource-constrained devices such as smartphones. To solve this problem, the authors propose a low-computation-cost Deep Supervision Pyramid Network (DSPNet). #### Main Contributions: 1. **Lightweight Upsampling Unit (LUSU)**: A lightweight upsampling unit is designed, combining separable transposed convolution, channel attention mechanism, and lightweight self-attention mechanism. This design can maintain high estimation accuracy while reducing the number of parameters. 2. **Deep Supervision Pyramid Architecture (DSP)**: A novel deep supervision pyramid architecture is proposed, introducing multi-scale supervision and a coarse-to-fine refinement process into a single-stage network. By sharing weights, the ability to acquire multi-scale information is retained during training, while only a single-branch structure is used during inference. #### Experimental Results: Experiments show that DSPNet achieves results comparable to existing state-of-the-art methods on the MPII and COCO pose estimation benchmarks, but with significantly reduced computational cost. For example, on the MPII validation set, when using EfficientNet as the backbone network, DSPNet's computational cost is only 12.7% of SimpleBaseline's, while the estimation accuracy is improved by 0.9 points.