A 2D Human Pose Estimation Method Based On Visual Transformer

Yaping Zou,Kunming Zhou,Weike Yi,Ling Chen,Liangdong Wu
DOI: https://doi.org/10.1145/3627341.3630373
2023-08-25
Abstract:Two-dimensional human pose estimation is the basis of human behavior understanding, but predicting a reasonable three-dimensional human pose sequence is still a challenging problem. To solve this problem, a pose estimation model named DEFormer based on ViT (Vision Transformer) is proposed, which uses a coordinate representation of key points' distribution perception to reduce quantization errors, and combines the original encoding module with an efficient encoding module to construct a lighter two-stage model. Experimental results show that on the CrowdPose dataset and a self-constructed campus scene human motion dataset, the DEFormer lightweight pose estimation model achieves a maximum average accuracy of 85.9% for human pose estimation, demonstrating more accurate pose estimation performance.
Computer Science
What problem does this paper attempt to address?