Learning Recurrent Structure-Guided Attention Network for Multi-person Pose Estimation.

Zhongwei Qiu,Kai Qiu,Jianlong Fu,Dongmei Fu
DOI: https://doi.org/10.1109/icme.2019.00079
2019-01-01
Abstract:Multi-person pose estimation aims to localize tens of human joints (e.g., elbow, wrist, etc.) from multiple human bodies in an image. Existing approaches mainly adopt a twostage pipeline, which usually consists of a human detector (i.e., generating a bounding box for each person) and a single person pose estimator (i.e., generating human joints from each bounding box). However, these approaches neglect the challenges of large pose variations and heavy occlusions in each bounding box, which often results in imprecise human joint localization. In this paper, we propose a structure-guided attention network (SGAN) for multi-person pose estimation. Specifically, a structured pose representation is encoded by learning a joint confidence map and a joint association map, which can be further refined by a structure-guided attention network (SGAN) in a recurrent way. Note that SGAN enables a deep neural network to take initial pose estimation as references, and to discover multi-scale pose features as completion, and thus the learning of pose structures can be reinforced. Extensive experiments show the best single-model results against the state-of-the-art approaches, with a relative 3.5% mAP gain in the challenging COCO Keypoint dataset.
What problem does this paper attempt to address?