Abstract:Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. Our evaluations demonstrate the effectiveness of the proposed method. Code and models are available: <a class="link-external link-https" href="https://github.com/3dpose/3D-Multi-Person-Pose" rel="external noopener nofollow">this https URL</a>.

CosyPose: Consistent multi-view multi-object 6D pose estimation

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views

6D Pose Estimation for Textureless Objects on RGB Frames using Multi-View Optimization

Spatial and temporal consistency learning for monocular 6D pose estimation

Scene-level Pose Estimation for Multiple Instances of Densely Packed Objects

Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation

MH6D: Multi-Hypothesis Consistency Learning for Category-Level 6-D Object Pose Estimation

Multi-view object pose estimation from correspondence distributions and epipolar geometry

SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation

BOP: Benchmark for 6D Object Pose Estimation

BOP-Distrib: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation

Multi-View Keypoints for Reliable 6D Object Pose Estimation

OnePose: One-Shot Object Pose Estimation Without CAD Models

SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects