Abstract:Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. Our evaluations demonstrate the effectiveness of the proposed method. Code and models are available: <a class="link-external link-https" href="https://github.com/3dpose/3D-Multi-Person-Pose" rel="external noopener nofollow">this https URL</a>.

YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

MDA-YOLO Person: a 2D human pose estimation model based on YOLO detection framework

YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation

YOLO-based GNN for Multi-Person Pose Estimation

KSL-POSE: A Real-Time 2D Human Pose Estimation Method Based on Modified YOLOv8-Pose Framework

Research on Multi-Person Pose Estimation Based on YOLO and Decoupled Multi-Level Feature Layers Fusion.

Research on Human Posture Estimation Algorithm Based on YOLO-Pose

Object Pose Estimation Based on Improved YOLOX Algorithm

Human Pose Estimation Based on Lightweight Multi-Scale Coordinate Attention

Improved YOLO-Pose Crowd Pose Estimation.

Unified End-to-End YOLOv5-HR-TCM Framework for Automatic 2D/3D Human Pose Estimation for Real-Time Applications

An enhanced real-time human pose estimation method based on modified YOLOv8 framework

YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation

Yoga Pose Estimation Using YOLO Model

AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose Regression

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Multi-Person Pose Estimation with Accurate Heatmap Regression and Greedy Association

YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection

A Compact and Powerful Single-Stage Network for Multi-Person Pose Estimation

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

RFA-YOLO-POSE: A Fusion Algorithm for Pose Detection and Object Identification Amidst Complex Crowds