Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

Lijuan Zhou,Xiang Meng,Zhihuan Liu,Mengqi Wu,Zhimin Gao,Pichao Wang
2023-10-20
Abstract:Human pose analysis has garnered significant attention within both the research community and practical applications, owing to its expanding array of uses, including gaming, video surveillance, sports performance analysis, and human-computer interactions, among others. The advent of deep learning has significantly improved the accuracy of pose capture, making pose-based applications increasingly practical. This paper presents a comprehensive survey of pose-based applications utilizing deep learning, encompassing pose estimation, pose tracking, and action recognition.Pose estimation involves the determination of human joint positions from images or image sequences. Pose tracking is an emerging research direction aimed at generating consistent human pose trajectories over time. Action recognition, on the other hand, targets the identification of action types using pose estimation or tracking data. These three tasks are intricately interconnected, with the latter often reliant on the former. In this survey, we comprehensively review related works, spanning from single-person pose estimation to multi-person pose estimation, from 2D pose estimation to 3D pose estimation, from single image to video, from mining temporal context gradually to pose tracking, and lastly from tracking to pose-based action recognition. As a survey centered on the application of deep learning to pose analysis, we explicitly discuss both the strengths and limitations of existing techniques. Notably, we emphasize methodologies for integrating these three tasks into a unified framework within video sequences. Additionally, we explore the challenges involved and outline potential directions for future research.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address research issues related to human pose estimation, pose tracking, and pose-based action recognition. Specifically: 1. **Comprehensive Review**: This paper provides a comprehensive review of pose estimation, pose tracking, and pose-based action recognition using deep learning techniques. This includes single-person pose estimation to multi-person pose estimation, 2D pose estimation to 3D pose estimation, extending from single images to video sequences, gradually mining temporal context to pose tracking, and finally transitioning from tracking to pose-based action recognition. 2. **Method Integration**: It emphasizes methods that integrate these three tasks into a unified framework and explores how this integration can be achieved in video sequences. 3. **Advantages and Disadvantages Analysis**: It discusses in detail the advantages and limitations of existing technologies, particularly from the perspective of combining these tasks to achieve more practical applications. 4. **Future Directions**: It explores the challenges present in these tasks and points out potential research directions. The uniqueness of this review lies in its first-time in-depth analysis of these three closely related but previously often separately handled tasks, focusing on methodological developments in the era of deep learning. In this way, the paper not only provides an overview of the latest research achievements but also offers valuable guidance for future research.