Multi-person pose estimation using atrous convolution

Lu Lin,Yifei Wang,Lianghao Wang,Ming Zhang,Dongxiao Li
DOI: https://doi.org/10.1049/el.2019.0351
2019-01-01
Electronics Letters
Abstract:The technology of human keypoint localisation has been greatly improved with the development of deep neural network. In particular, recent methods that exploit multi-scale features and cascaded networks have achieved the accurate prediction of multi-person keypoints. These methods typically extract small-resolution feature maps with classical backbone and then generate heatmaps through upsampling. However, consecutive striding is harmful for keypoint localisation since detail information is decimated. In this Letter, the authors present a novel network structure that uses atrous spatial pyramid pooling to generate keypoint prediction. First, atrous convolution is used in the backbone to expand the receptive field and maintain the scale of the feature map. Thus, the size of the feature map can be guaranteed to avoid too many details being removed. Second, multi-scale features are extracted using an atrous spatial pyramid pooling module to enrich the scale information of the obtained features. Finally, instead of upsampling, deconvolutional layers are applied to construct the output heatmaps. State-of-the-art results are achieved on the MS COCO 2017 keypoint database.
What problem does this paper attempt to address?