Abstract:Pose estimation in crowded scenes is key to understanding human behavior in real-life applications. Most existing CNN-based pose estimation methods often depend on the appearance of visible parts as cues to localize human joints. However, occlusion is typical in crowded scenes, and invisible body parts have no valid features for joint localization. Introducing prior information about the human pose structure to infer the locations of occluded parts is a natural solution to this problem. In this paper, we argue that learning structural information based on human joints alone is not enough to address human body variations and could be prone to overfitting. From a perspective on the human pose as a dual representation of joints and limbs, we propose a pose refinement network, coined as dual graph network (DGN), to jointly learn its structural information of body joints and limbs by incorporating the cooperative constraints between two branches. Specifically, our DGN has two coupled graph convolutional network (GCN) branches to model the structure information of joints and limbs. Each stage in the branch is composed of a feature aggregator and a GCN module for inter-branch information fusion and intra-branch context extraction, respectively. In addition, to enhance the modeling capacity of GCN, we design an adaptive GCN layer (AGL) embedded in the GCN module to handle each pose instance based on its graph structure. We also propose a heatmap-guided sampling to leverage the features of the body parts to provide rich visual features for the inference of occluded parts. We perform extensive experiments on five challenging datasets to demonstrate the effectiveness of our DGN on pose estimation. Our DGN obtains significant performance improvement from 67.9 to 72.4 mAP in the CrowdPose dataset with the same CNN-based pose estimator and training strategy as the OPEC-Net. It shows that, compared to the OPEC-Net only considering joints, our DGN has a clear advantage due to the joint consideration of both joints and limbs. Meanwhile, our DGN is also helpful for pose estimation in general datasets (i.e., COCO and Pose track) with less occlusion and mutual interference, demonstrating the generalization power of DGN on refining human poses.

RSGNet: Relation Based Skeleton Graph Network for Crowded Scenes Pose Estimation

Dual Graph Networks for Pose Estimation in Crowded Scenes

Learning Recurrent Structure-Guided Attention Network for Multi-person Pose Estimation.

Multi-person Pose Estimation Based on Graph Grouping Optimization

CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark

3D Human Pose Estimation Via Graph Extended Spatio-Temporal Convolutional Network

I^2R-Net: Intra- and Inter-Human Relation Network for Multi-Person Pose Estimation

Learning Joint Structure for Human Pose Estimation

Relation-balanced graph convolutional network for 3D human pose estimation

Regular Splitting Graph Network for 3D Human Pose Estimation

DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation

SRNet: Structural Relation-aware Network for Head Pose Estimation

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

Multi-person 3D pose estimation from unlabelled data

Relation-Based Associative Joint Location for Human Pose Estimation in Videos

Structure-aware human pose estimation with graph convolutional networks

Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition

Towards Scalable Scenarios Human Pose Estimation Via Two-Stage Hierarchical Network

QuickPose: Real-time Multi-view Multi-person Pose Estimation in Crowded Scenes

Joints Relation Inference Network for Skeleton-Based Action Recognition.

Human pose estimation in crowded scenes using Keypoint Likelihood Variance Reduction