An Adaptive Stacked Hourglass Network with Kalman Filter for Estimating 2D Human Pose in Video.

Tao Hu,Chunxia Xiao,Geyong Min,Noushin Najjari
DOI: https://doi.org/10.1111/exsy.12552
IF: 3.3
2020-01-01
Expert Systems
Abstract:One of the main challenges in computer science and image processing is 2D human pose estimation. Specifically, occlusion and in particular occlusion of human joints caused by camera angle are of paramount importance. In this paper, a new highly accurate network was proposed that can estimate 2D human poses in video images using deep learning. We employ the Single Shot MultiBox Detector network to detect the centre position of each human within a video frame and then use the stacked hourglass network to estimate the 2D human pose. We approximate the human motion as a linear motion between different frames in a certain period; and optimize the human centres based on the local outlier factor and Kalman filters. The same method is applied to optimize the human pose estimations in video, which can address the inaccurate prediction caused by human joints occlusion. The proposed adaptive network is tested using the two well-known benchmarks for human pose estimation (MPII and Joint-annotated Human Motion Data Base datasets), and we also generate some 2D human pose estimating qualitative results of single and multiple people in Internet videos. The experimental results show that the proposed network has strong practicability and can achieve high accuracy on adaptive estimating the 2D human pose in video.
What problem does this paper attempt to address?