ICRFormer: an Improving Cos-Reweighting Transformer for 3D Human Pose Estimation in Video

Kaixu Zhang,Xiaoming Luan,Tafseer Haider Shah Syed,Xuezhi Xiang
DOI: https://doi.org/10.1109/ccdc58219.2023.10326602
2023-01-01
Abstract:Monocular 3D human pose estimation is a difficult task due to depth blur and occlusion. Spatial-temporal information is used in recent methods to alleviate these problems. However, recent methods usually utilize the original Transformer architecture and ignore the impact of efficient Transformer on 3D HPE. Therefore, we propose an improving Cos-Reweighting Transformer (ICRFormer) to estimate 3D human pose with better performance. In detail, we firstly analyze the general expressions of attention and propose an efficient Transformer architecture named Enahnced Transformer (EFormer). Then, to further extract the association among different sequences, we propose a Cos-Reweighting module to strengthen the ability of correlation calculations. Abundant experiments are conducted on two commonly used datasets: Human3.6M and MPI-INF-3DHP to evaluate ICRFormer. It is shown that our model achieve competitive performance on Human3.6M and obtains the state-of-the-art result on MPI-INF-3DHP. Compared with the most recent method MHFormer, our model outperforms it 0.2% MPJPE on Human3.6M dataset and 43.4% MPJPE on MPI-INF-3DHP.
What problem does this paper attempt to address?