Multi-Person Pose Estimation in the Wild: Using Adversarial Method to Train a Top-Down Pose Estimation Network

Tong Zhang,Jingxiang Lian,Jingtao Wen,C. L. Philip Chen
DOI: https://doi.org/10.1109/tsmc.2023.3234611
2023-01-01
IEEE Transactions on Systems Man and Cybernetics Systems
Abstract:Recent studies estimate human anatomical key points through the single monocular image, in which multichannel heatmaps are the key factor in determining the quality of human pose estimation. Multichannel heatmaps can efficiently handle the image-to-coordinate mapping task and the processing of semantic features. Most methods ignore physical constraints and internal relationships of human body parts, which easily misclassify left and right symmetrical parts as similar features. Some studies use RNNs on the top to incorporate priors about the structure of pose components and body configuration. Therefore, a novel top-down convolutional network is proposed to consider these priors during training, which can improve the robustness under complex field conditions in the wild. In order to learn the prior knowledge of human pose configuration, the hierarchy of fully convolutional networks (discriminator) is used to distinguish real poses from fake ones. Consequently, the pose network is inclined to make a pose estimation that the discriminator misjudges as true, which is reasonable in complex situations. The performance of the method is experimentally validated by pose estimation on the MS COCO human key point detection task. The proposed approach outperforms the original method and generates robust pose predictions, demonstrating efficiency by using adversarial learning.
What problem does this paper attempt to address?