Abstract:Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. Our evaluations demonstrate the effectiveness of the proposed method. Code and models are available: <a class="link-external link-https" href="https://github.com/3dpose/3D-Multi-Person-Pose" rel="external noopener nofollow">this https URL</a>.

MH-Net: Multiheaded 3D Hand Pose Estimation Network with 3D Anchorsets and Improved Multiscale Vision Transformer

Multi Hybrid Extractor Network for 3D Human Pose Estimation

HMTNet:3D Hand Pose Estimation from Single Depth Image Based on Hand Morphological Topology

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Unsupervised Universal Hierarchical Multi-Person 3D Pose Estimation for Natural Scenes

Multi-virtual View Scoring Network for 3D Hand Pose Estimation from a Single Depth Image

3D hand pose and mesh estimation via a generic Topology-aware Transformer model

Two Heads Are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation

Hand3D: Hand Pose Estimation using 3D Neural Network

MVPointNet: Multi-View Network for 3D Object Based on Point Cloud

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

Hand Pose Estimation with Attention-and-Sequence Network.

MANet: Multi-level Attention Network for 3D Human Shape and Pose Estimation.

DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth

Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

Accurate 3D Hand Pose Estimation Network Utilizing Joints Information.

Enhancement and Optimisation of Human Pose Estimation with Multi-Scale Spatial Attention and Adversarial Data Augmentation

QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation

A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image

HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network

An Adaptive Viewpoint Transformation Network for 3D Human Pose Estimation