Human Pose Estimation Via Parse Graph of Body Structure

Shibang Liu,Xuemei Xie,Guangming Shi
DOI: https://doi.org/10.1109/tcsvt.2024.3435014
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:When observing a person’s body, humans can extract the structured representation of the body called a parse graph, which includes the hierarchical decompositions from the entire body to parts and primitives and the context relations by horizontal links between the body parts. This ability helps humans better locate body structures at different levels. In order for the model to have this ability for single-person pose estimation, we design a hierarchical network to model the context relations and hierarchical structure in the parse graph of body structure by convolutional neural networks. It overcomes the problem that most methods ignore one of the context relations and hierarchical structure in the parse graph. Our network contains bottom-up and top-down stages. In the bottom-up stage, the structural features of the hierarchy are captured from primitives to parts and the entire body. Then in the top-down stage, with the context information of each body part, the structural features of the body parts are refined separately rather than together from the entire body to parts and primitives. Experiments show that our model enhances the reasonableness of predictions and achieves superior results on the CrowdPose, COCO keypoint detection and MPII human pose datasets.
What problem does this paper attempt to address?