SPGformer: Serial-Parallel Hybrid GCN-Transformer with Graph-Oriented Encoder for 2D-to-3d Human Pose Estimation

Qin Fang,Zihan Xu,Mengxian Hu,Qinyang Zeng,Chengju Liu,Qijun Chen
DOI: https://doi.org/10.1109/tim.2024.3381701
IF: 5.6
2024-01-01
IEEE Transactions on Instrumentation and Measurement
Abstract:Accurate acquisition of 3D human joint poses holds significant implications for tasks such as human action recognition. Monocular single-frame 2D-to-3D pose estimation focuses on establishing the correspondence between 2D human pose in a single image and their 3D spatial pose, delegating the preliminary task of 2D pose estimation to models better suited for processing pixel information. The intricacy of 2D-to-3D pose estimation resides in modeling the spatial constraints among joints. To better learn the structure between joints, this paper proposes the SPGformer algorithm, constructed with stacked Serial-Parallel GCN-Encoder (SPGEncoder) modules. This module forms a dual-branch framework composed of Transformer Encoders (Encoders) and graph-oriented Encoders (GraEncoders). We recover concealed depth values from the 2D coordinates of joints, inputting them into the joint branch of the SPGEncoder. In parallel, we take the connection features of joints in the image as vector branch input. The proposed GraEncoder module integrates a learnable GCN prior to the Encoder, enabling the learning of a broader spectrum of joint connections within the confines of skeletal linkage. Furthermore, this paper presents a methodology for calculating the 3D absolute pose of the root node, filling a research gap for applications requiring precise human position. This non-learnable, plug-and-play method has been validated on the Human3.6M dataset. The SPGformer algorithm outperforms State-of-the-Art methods on both the Human3.6M and MPI-INF-3DHP datasets.
What problem does this paper attempt to address?