HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation

Honghong Yang,Hongxi Liu,Yumei Zhang,Xiaojun Wu
DOI: https://doi.org/10.1007/s00530-023-01085-y
IF: 3.9
2023-04-19
Multimedia Systems
Abstract:With the powerful representative ability of learning human skeleton, the graph convolutional network (GCN) is a popular baseline for 3D human pose estimation (HPE). However, current GCN-based 3D HPE methods primarily use "message-passing" architectures to aggregate the node information through the edges of the graph at "one scale". In such architectures, the learnt node features are uniform and cannot learn hierarchical representation of the graph-structured data. In this study, a hierarchically stacked graph network (HSGNet) with attention constraint for 3D HPE was proposed. An attention-constrained GCN layer (AGCN) was designed as the basic unit for constructing the HSGNet. With the specially designed AGCN layer, we computed the attention coefficients for each node to pick the most important node and suppressed the redundant information from the neighbors in feature aggregation. Then, a coarse graph layer with pooling map was devised for stacking the multiple GCN layers in a hierarchical manner, where a pooling map matrix was used to cluster the nodes for graph representation according to the human skeleton structure. Finally, an HSGNet was constructed in an encoder–decoder framework to further embed the global and local information of the full skeleton to achieve the final embedding feature for 3D pose regression. Our method was validated on two benchmark datasets: Human3.6M and MPI-INF-3DHP. Experimental results showed that the proposed method yielded good performance for 3D HPE.
computer science, information systems, theory & methods
What problem does this paper attempt to address?