Abstract:3D human pose estimation is a major focus area in the field of computer vision, which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective of methods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint, common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity, and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input, the multi-view method is commonly used to solve occlusion problems.Here, the multi-view method is analyzed comprehensively.By referring to video-based input, the human prior knowledge of restricted motion is used to predict human postures.In addition, structural constraints are widely used as prior knowledge.Furthermore, weakly supervised learning methods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered, especially because 3D datasets are usually biased and limited.Finally, emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus, this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition, by providing an overview of 3D human pose estimation, this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods, and discusses the scope for further research.

Human Pose Estimation with Regression by Fusing Multi-View Visual Information

Human Pose Regression Through Multiview Visual Fusion.

3D Human Pose Estimation Based on Multi View Information Fusion

Research on 3D Human Pose Estimation Technique Based on Multi-View Information Fusion

3D Human Pose Estimation from Deep Multi-View 2D Pose

Discriminative Estimation of 3D Human Pose Using Gaussian Processes.

3D Human pose estimation from video via multi-scale multi-level spatial temporal features

3D Human Pose Estimation in Motion Based on Multi-Stage Regression

Dual-view 3D Human Pose Estimation Without Camera Parameters for Action Recognition

Human Pose Estimation from Monocular Images: A Comprehensive Survey.

Towards Locality Similarity Preserving to 3D Human Pose Estimation.

Overview of 3D Human Pose Estimation

3d Body Pose And Shape Estimation From Multi-View Images With Limb Geometric Constraint

Human Pose Estimation Based on Cross-View Feature Fusion

An Adaptive Viewpoint Transformation Network for 3D Human Pose Estimation

Reconstructing 3D human pose and shape from a single image and sparse IMUs

Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation

A Survey on Monocular 3D Human Pose Estimation

3D Human Body Shape and Pose Estimation from Depth Image.

3D Human Pose Estimation with Single Image and Inertial Measurement Unit (IMU) Sequence

3D Human Pose Estimation from Multiple Dynamic Views Via Single-view Pretraining with Procrustes Alignment