Abstract:Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. Our evaluations demonstrate the effectiveness of the proposed method. Code and models are available: <a class="link-external link-https" href="https://github.com/3dpose/3D-Multi-Person-Pose" rel="external noopener nofollow">this https URL</a>.

A Multi-Task Neural Network for Action Recognition with 3D Key-Points.

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

KAN-HyperpointNet for Point Cloud Sequence-Based 3D Human Action Recognition

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

Online Robust Action Recognition Based on a Hierarchical Model

Multi-person pose estimation using atrous convolution

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

Empowering Efficient Spatio-Temporal Learning with a 3D CNN for Pose-Based Action Recognition

Center point to pose: Multiple views 3D human pose estimation for multi-person

Multi-task neural network with physical constraint for real-time multi-person 3D pose estimation from monocular camera

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

Action recognition method based on a novel keyframe extraction method and enhanced 3D convolutional neural network

Channel attention and multi-scale graph neural networks for skeleton-based action recognition

Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences

End-to-end Learning of Deep Convolutional Neural Network for 3D Human Action Recognition

ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action

Spatiotemporal Multi-Task Network for Human Activity Understanding.

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

View-Robust Neural Networks for Unseen Human Action Recognition in Videos

Three-Dimensional Action Recognition for Basketball Teaching Coupled with Deep Neural Network