Temporal Feature Correlation for Human Pose Estimation in Videos

Wentian Li,Xiangyu Xu,Yu-Jin Zhang
DOI: https://doi.org/10.1109/icip.2019.8803797
2019-01-01
Abstract:Effectively utilizing temporal information is critical for human pose estimation in videos. Recent methods either neglect the displacements of keypoints in the video frames, or rely on time-consuming optical flow estimation when fusing temporal information. By contrast, we propose a flow-free and displacement-aware algorithm for pose estimation in videos. Our method is based on the observation that the appearance of the body keypoints remains almost unchanged throughout a video. This motivates us to exploit temporal visual consistency of keypoints via temporal feature correlation to establish sparse correspondences between the keypoints in neigh-boring frames. Specifically, we first extract keypoint features from the previous frame, which can be treated as exemplars to search on the intermediate feature map of the current frame. Then we conduct temporal feature correlation for the keypoint search, and the obtained correlation maps are combined with the convolutional features to further guide heatmap estimation. Extensive experiments demonstrate that the proposed method compares favorably against state-of-the-art approaches on both sub-JHMDB and Penn Action datasets. More importantly, our method is robust to large keypoint displacements and could be applied to videos under fast motion.
What problem does this paper attempt to address?