Abstract:Human pose estimation is aimed at locating the anatomical parts or keypoints of the human body and is regarded as a core component in obtaining detailed human understanding in images or videos. However, the occlusion and overlap upon human bodies and complex backgrounds often result in implausible pose predictions. To address the problem, we propose a structure-aware adversarial framework, which combines cues of local joint interconnectivity and priors about the holistic structure of human bodies, achieving high-quality results for multiperson human pose estimation. Effective learning of such cues and priors is typically a challenge. The presented framework uses a nonparametric representation, which is referred to as the Keypoint Biorientation Field (KBOF), to learn orientation cues of joint interinteractivity in the image, just as human vision can explore geometric constraints of joint interconnectivity. Additionally, a module using multiscale feature representation with inflated convolution for joint heatmap detection and Keypoint Biorientation Field detection is applied in our framework to fully explore the local features of joint points and the bidirectional connectivity between them at the microscopic level. Finally, we employ improving generative adversarial networks which use KBOF and multiscale feature extraction that implicitly leverages the cues and priors about the structure of human bodies for global structural inference. The adversarial network enables our framework to combine information about the connections between local body joints at the microscopic level and the structural priors of the human body at the global level, thus enhancing the performance of our framework. The effectiveness and robustness of the network are evaluated on the task of human pose prediction in two widely used benchmark datasets, i.e., MPII and COCO datasets. Our approach outperforms the state-of-the-art methods, especially in the case of complex scenes. Our method achieves an improvement of 2.6% and 1.7% compared to the latest method on the MPII test set and COCO validation set, respectively.

Multi-Scale Structure-Aware Network for Human Pose Estimation

Multi-Scale Supervised Network for Human Pose Estimation

Multi-Scale Adaptive Structure Network For Human Pose Estimation From Color Images

MSPENet: Multi-Scale Adaptive Fusion and Position Enhancement Network for Human Pose Estimation

A Multi-Level Network for Human Pose Estimation

Human Pose Estimation Based on Feature Enhancement and Multi-Scale Feature Fusion

Enhancement and Optimisation of Human Pose Estimation with Multi-Scale Spatial Attention and Adversarial Data Augmentation

Scale-aware Attention-Based Multi-Resolution Representation for Multi-Person Pose Estimation

Multi-person pose estimation using atrous convolution

Rethinking on Multi-Stage Networks for Human Pose Estimation

A Multi-stage Feature Fusion Network for Human Pose Estimation

A Structure-Aware Adversarial Framework with the Keypoint Biorientation Field for Multiperson Pose Estimation

Learning Recurrent Structure-Guided Attention Network for Multi-person Pose Estimation.

A Multi-scale Recalibrated Approach for 3D Human Pose Estimation.

Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation

Learning Joint Structure for Human Pose Estimation

Multi-scale Recalibration with Advanced Geometry Constraints for 3D Human Pose Estimation

Improving Human Pose Estimation Based on Stacked Hourglass Network

Structure-aware human pose estimation with graph convolutional networks

Full Scale-Aware Balanced High-Resolution Network for Multi-Person Pose Estimation

Deep Dual Consecutive Network for Human Pose Estimation