Abstract:In real-world applications of human pose estimation, low-resolution input images are frequently encountered when the performance of the image acquisition equipment is limited or the shooting distance is too far. However, existing state-of-the-art models for human pose estimation perform poorly on low-resolution images. One key reason is the presence of downsampling layers in these models, e.g., strided convolutions and pooling layers. It further reduces the already insufficient image information. Another key reason is that the body skeleton and human kinematic information are not fully utilized. In this work, we propose a Multi-Granular Information-Lossless (MGIL) model to replace the downsampling layers to address the above issues. Specifically, MGIL employs a Fine-grained Lossless Information Extraction (FLIE) module, which can prevent the loss of local information. Furthermore, we design a Coarse-grained Information Interaction (CII) module to adequately leverage human body structural information. To efficiently fuse cross-granular information and thoroughly exploit the relationships among keypoints, we further introduce a Multi-Granular Adaptive Fusion (MGAF) mechanism. The mechanism assigns weights to features of different granularities based on the content of the image. The model is effective, flexible, and universal. We show its potential in various vision tasks with comprehensive experiments. It outperforms the SOTA methods by 7.7 mAP on COCO and performs well with different input resolutions, different backbones, and different vision tasks. The code is provided in supplementary material.

Learning high resolution reservation for human pose estimation

Adaptively Fusing Complete Multi-resolution Features for Human Pose Estimation.

Deep High-Resolution Representation Learning For Human Pose Estimation

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

Multi-Scale Structure-Aware Network for Human Pose Estimation

Complementary Feature Pyramid Network for Human Pose Estimation

Full-Resolution Encoder-Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

Multi-person pose estimation using atrous convolution

Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

Stacked Hourglass Networks for Human Pose Estimation

An improved lightweight high-resolution network based on multi-dimensional weighting for human pose estimation

Lightweight high-resolution network based on adaptive cross-dimensional weighting for human pose estimation

Multi-Scale Supervised Network for Human Pose Estimation

Improving Human Pose Estimation Based on Stacked Hourglass Network

A Lightweight Network Based on Pyramid Residual Module for Human Pose Estimation

Composite Localization for Human Pose Estimation

Human Pose Estimation Based on Efficient and Lightweight High-Resolution Network (EL-HRNet)

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention

Graph U-Shaped Network with Mapping-Aware Local Enhancement for Single-Frame 3D Human Pose Estimation