Abstract:Pose estimation in crowded scenes is key to understanding human behavior in real-life applications. Most existing CNN-based pose estimation methods often depend on the appearance of visible parts as cues to localize human joints. However, occlusion is typical in crowded scenes, and invisible body parts have no valid features for joint localization. Introducing prior information about the human pose structure to infer the locations of occluded parts is a natural solution to this problem. In this paper, we argue that learning structural information based on human joints alone is not enough to address human body variations and could be prone to overfitting. From a perspective on the human pose as a dual representation of joints and limbs, we propose a pose refinement network, coined as dual graph network (DGN), to jointly learn its structural information of body joints and limbs by incorporating the cooperative constraints between two branches. Specifically, our DGN has two coupled graph convolutional network (GCN) branches to model the structure information of joints and limbs. Each stage in the branch is composed of a feature aggregator and a GCN module for inter-branch information fusion and intra-branch context extraction, respectively. In addition, to enhance the modeling capacity of GCN, we design an adaptive GCN layer (AGL) embedded in the GCN module to handle each pose instance based on its graph structure. We also propose a heatmap-guided sampling to leverage the features of the body parts to provide rich visual features for the inference of occluded parts. We perform extensive experiments on five challenging datasets to demonstrate the effectiveness of our DGN on pose estimation. Our DGN obtains significant performance improvement from 67.9 to 72.4 mAP in the CrowdPose dataset with the same CNN-based pose estimator and training strategy as the OPEC-Net. It shows that, compared to the OPEC-Net only considering joints, our DGN has a clear advantage due to the joint consideration of both joints and limbs. Meanwhile, our DGN is also helpful for pose estimation in general datasets (i.e., COCO and Pose track) with less occlusion and mutual interference, demonstrating the generalization power of DGN on refining human poses.

HKE-GCN: Heatmaps-guided Keypoints Encoder and Graph Convolutional Network for Human Pose Estimation

Context-Guided Adaptive Network for Efficient Human Pose Estimation.

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution.

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Hierarchical Graph Neural Network for Human Pose Estimation

Structure-aware human pose estimation with graph convolutional networks

Cascaded Pyramid Network for Multi-Person Pose Estimation

Adaptive Hypergraph Neural Network for Multi-Person Pose Estimation

Multi-person pose estimation using atrous convolution

Dual Graph Networks for Pose Estimation in Crowded Scenes

Joint graph convolution networks and transformer for human pose estimation in sports technique analysis

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

Graph U-Shaped Network with Mapping-Aware Local Enhancement for Single-Frame 3D Human Pose Estimation

Relation-balanced graph convolutional network for 3D human pose estimation

Locally Connected Network for Monocular 3D Human Pose Estimation

Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition

Optimizing Network Structure for 3D Human Pose Estimation.

Multi-Scale Supervised Network for Human Pose Estimation

HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation

High-order local connection network for 3D human pose estimation based on GCN

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention