VH3D-LSFM: Video-Based Human 3D Pose Estimation with Long-Term and Short-Term Pose Fusion Mechanism.

Wenjin Deng,Yinglin Zheng,Hui Li,Xianwei Wang,Zizhao Wu,Ming Zeng
DOI: https://doi.org/10.1007/978-3-030-60633-6_49
2020-01-01
Abstract:Following the success of 2D human pose estimation from a single image, a lot of work focus on video-based 3D human pose estimation by exploiting temporal information. In this scenario, several recent works have achieved significant advances via Temporal Convolution Network (TCN). However, the current TCN fashion suffers from lacking local coherence caused by excessive dependence on local frames and limited local dynamic range, failing to estimate poses correctly in real scenes, especially that with high-speed motions. To tackle this problem, we design a Long-term Bank to select and collect candidate key poses, and further provide a LSFM (Long-term and Short-term pose Fusion Mechanism) to integrate long-term pose information into the short-term convolution window, thus to enhance the temporal coherence of local neighbor frames. Experimental results and ablation studies demonstrate that the proposed approach significantly promotes the accuracy and robustness of the state-of-the-art method.
What problem does this paper attempt to address?