Abstract:Human pose analysis has garnered significant attention within both the research community and practical applications, owing to its expanding array of uses, including gaming, video surveillance, sports performance analysis, and human-computer interactions, among others. The advent of deep learning has significantly improved the accuracy of pose capture, making pose-based applications increasingly practical. This paper presents a comprehensive survey of pose-based applications utilizing deep learning, encompassing pose estimation, pose tracking, and action recognition.Pose estimation involves the determination of human joint positions from images or image sequences. Pose tracking is an emerging research direction aimed at generating consistent human pose trajectories over time. Action recognition, on the other hand, targets the identification of action types using pose estimation or tracking data. These three tasks are intricately interconnected, with the latter often reliant on the former. In this survey, we comprehensively review related works, spanning from single-person pose estimation to multi-person pose estimation, from 2D pose estimation to 3D pose estimation, from single image to video, from mining temporal context gradually to pose tracking, and lastly from tracking to pose-based action recognition. As a survey centered on the application of deep learning to pose analysis, we explicitly discuss both the strengths and limitations of existing techniques. Notably, we emphasize methodologies for integrating these three tasks into a unified framework within video sequences. Additionally, we explore the challenges involved and outline potential directions for future research.

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

Human Action Recognition Using Deep Learning Methods.

Human Action Recognition From Digital Videos Based on Deep Learning.

Recognizing Human Actions As the Evolution of Pose Estimation Maps

A Multi-Task Neural Network for Action Recognition with 3D Key-Points.

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition.

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

An Approach to Pose-Based Action Recognition

Empowering Efficient Spatio-Temporal Learning with a 3D CNN for Pose-Based Action Recognition

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

Exploiting deep residual networks for human action recognition from skeletal data

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

End-to-end Learning of Deep Convolutional Neural Network for 3D Human Action Recognition

Deep Dual Consecutive Network for Human Pose Estimation

Human Pose Estimation Using Deep Structure Guided Learning.

Human action recognition using a dynamic Bayesian action network with 2D part models

Explore Human Parsing Modality for Action Recognition

Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences

Joint Dynamic Pose Image and Space Time Reversal for Human Action Recognition from Videos

A very deep sequences learning approach for human action recognition

Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition