Abstract:Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. Our evaluations demonstrate the effectiveness of the proposed method. Code and models are available: <a class="link-external link-https" href="https://github.com/3dpose/3D-Multi-Person-Pose" rel="external noopener nofollow">this https URL</a>.

Online Monitoring for Neural Network Based Monocular Pedestrian Pose Estimation

Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss

Adversarial Attacks on Monocular Pose Estimation

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

Progress and limitations of deep networks to recognize objects in unusual poses

Geometry-Driven Self-Supervised Method for 3D Human Pose Estimation

Robust self-supervised monocular visual odometry based on prediction-update pose estimation network.

Is my Driver Observation Model Overconfident? Input-guided Calibration Networks for Reliable and Interpretable Confidence Estimates

Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring

OnionNet: Single-View Depth Prediction and Camera Pose Estimation for Unlabeled Video

Human Pose Estimation in Monocular Omnidirectional Top-View Images

An Online Calibration Method for Robust Multi-Modality 3D Object Detection

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

Lightweight convolutional neural network for real-time 3D object detection in road and railway environments

Learning a Robust Part-Aware Monocular 3D Human Pose Estimator via Neural Architecture Search

3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training

Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns

An Attention-Based Deep Learning Architecture for Real-Time Monocular Visual Odometry: Applications to GPS-free Drone Navigation

Monocular 3D Human Pose Markerless Systems for Gait Assessment

SelfOdom: Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery

Monitizer: Automating Design and Evaluation of Neural Network Monitors