Joint graph convolution networks and transformer for human pose estimation in sports technique analysis

Hongren Cheng,Jing Wang,Anran Zhao,Yaping Zhong,Jingli Li,Liangshan Dong
DOI: https://doi.org/10.1016/j.jksuci.2023.101819
IF: 9.006
2023-11-10
Journal of King Saud University - Computer and Information Sciences
Abstract:Human pose estimation has various applications in domains such as sports technology analysis, virtual reality, and education. However, most previous studies focused on the respective feature representations of keypoints, but disregarded the topological relationship among keypoints. To address this challenge, we propose GTPose, a network structure that integrates graph convolutional networks and Transform. First of all, a set of multi-scale convolution operations are applied to extract local feature maps of images. Secondly, the positions of keypoints are roughly estimated by using Transform to process the sequential relations between feature maps. Finally, GCN is adopted to model the topological structure between keypoints to accurately locate the location of keypoints and learn feature representations. The performance of GTPose is evaluated on two real datasets: MS COCO and MPII. Experimental results demonstrate that GTPose outperforms other methods in human pose estimation tasks. In addition, experimental results also show that the spatial relationship between keypoints is effective for accurately characterizing keypoints.
computer science, information systems
What problem does this paper attempt to address?