Abstract:Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. Our evaluations demonstrate the effectiveness of the proposed method. Code and models are available: <a class="link-external link-https" href="https://github.com/3dpose/3D-Multi-Person-Pose" rel="external noopener nofollow">this https URL</a>.

Simultaneous Multiple Object Detection and Pose Estimation using 3D Model Infusion with Monocular Vision

MORE: Simultaneous Multi-View 3D Object Recognition and Pose Estimation

SEMPose: A Single End-to-end Network for Multi-object Pose Estimation

A Two-Stage Monocular Vision Detection Method for 6D Pose Estimation in Multi-Heterogeneous Robot Systems

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

Simultaneous face detection and 360 degree headpose estimation

The challenge of simultaneous object detection and pose estimation: a comparative study

SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation

Monocular 3D Detection With Geometric Constraint Embedding and Semi-Supervised Training

Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring

Scene Recognition and Object Detection in a Unified Convolutional Neural Network on a Mobile Manipulator

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation

Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation

On Boosting Single-Frame 3D Human Pose Estimation Via Monocular Videos.

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

One Point, One Object: Simultaneous 3D Object Segmentation and 6-DOF Pose Estimation

Learning Deep Network for Detecting 3D Object Keypoints and 6D Poses

Simultaneous Face Detection And Head Pose Estimation: A Fast And Unified Framework

A Pose Estimation Algorithm for Multimodal Data Fusion

PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model

MH-Net: Multiheaded 3D Hand Pose Estimation Network with 3D Anchorsets and Improved Multiscale Vision Transformer