Abstract:The rise of deep learning technology has broadly promoted the practical application of artificial intelligence in production and daily life. In computer vision, many human-centered applications, such as video surveillance, human-computer interaction, digital entertainment, etc., rely heavily on accurate and efficient human pose estimation techniques. Inspired by the remarkable achievements in learning-based 2D human pose estimation, numerous research studies are devoted to the topic of 3D human pose estimation via deep learning methods. Against this backdrop, this paper provides an extensive literature survey of recent literature about deep learning methods for 3D human pose estimation to display the development process of these research studies, track the latest research trends, and analyze the characteristics of devised types of methods. The literature is reviewed, along with the general pipeline of 3D human pose estimation, which consists of human body modeling, learning-based pose estimation, and regularization for refinement. Different from existing reviews of the same topic, this paper focus on deep learning-based methods. The learning-based pose estimation is discussed from two categories: single-person and multi-person. Each one is further categorized by data type to the image-based methods and the video-based methods. Moreover, due to the significance of data for learning-based methods, this paper surveys the 3D human pose estimation methods according to the taxonomy of supervision form. At last, this paper also enlists the current and widely used datasets and compares performances of reviewed methods. Based on this literature survey, it can be concluded that each branch of 3D human pose estimation starts with fully-supervised methods, and there is still much room for multi-person pose estimation based on other supervision methods from both image and video. Besides the significant development of 3D human pose estimation via deep learning, the inherent ambiguity and occlusion problems remain challenging issues that need to be better addressed.

Computer vision approaches based on deep learning and neural networks: Deep neural networks for video analysis of human pose estimation

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

Visual-based Positioning and Pose Estimation

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation

Human Pose Estimation in Monocular Omnidirectional Top-View Images

Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods

Deep Learning Methods for 3D Human Pose Estimation under Different Supervision Paradigms: A Survey

Human Pose Estimation Using Deep Structure Guided Learning.

Vision-Based Human Pose Estimation via Deep Learning: A Survey

Learning to Estimate Pose by Watching Videos

3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

Deep Dual Consecutive Network for Human Pose Estimation

Motion Capture for Sporting Events Based on Graph Convolutional Neural Networks and Single Target Pose Estimation Algorithms

Using Deep Neural Networks for Human Fall Detection Based on Pose Estimation

Video Based Fall Detection Using Human Poses

Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision

A Novel Convolutional Neural Network for Head Detection and Pose Estimation in Complex Environments from Single-Depth Images

Self-Supervised 3D Human Pose Estimation in Static Video Via Neural Rendering