Abstract:Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. Our evaluations demonstrate the effectiveness of the proposed method. Code and models are available: <a class="link-external link-https" href="https://github.com/3dpose/3D-Multi-Person-Pose" rel="external noopener nofollow">this https URL</a>.

Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods

Comparison of marker-less 2D image-based methods for infant pose estimation

Towards human-level performance on automatic pose estimation of infant spontaneous movements

A self-supervised spatio-temporal attention network for video-based 3D infant pose estimation

Estimation of human body 3D pose for parent-infant interaction settings using azure Kinect and OpenPose

Three-Dimensional Pose Estimation of Infants Lying Supine Using Data from a Kinect Sensor with Low Training Cost

Preterm infants' pose estimation with spatio-temporal features

Under the Cover Infant Pose Estimation using Multimodal Data

Deep-learning for automated markerless tracking of infants general movements

AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation

Automatic Posture and Movement Tracking of Infants with Wearable Movement Sensors

Preterm infants' limb-pose estimation from depth images using convolutional neural networks

A pose estimation for motion tracking of infants cerebral palsy

Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video

Supine Infant Pose Estimation Via Single Depth Image.

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation

DeepLabCut: markerless pose estimation of user-defined body parts with deep learning

Accurate prediction of neurologic changes in critically ill infants using pose AI

3D Human Pose Estimation from Multiple Dynamic Views Via Single-view Pretraining with Procrustes Alignment

Towards Accurate Markerless Human Shape and Pose Estimation over Time