Improved Modular Convolution Neural Network for Human Pose Estimation

Zhengxuan Zhang,Jing Dong,Dongsheng Zhou,Xiaoyong Fang,Xiaopeng Wei
DOI: https://doi.org/10.1007/978-3-030-23712-7_53
2019-01-01
Abstract:Human pose estimation in image is an important branch of computer vision and graphics research. In this paper, an improved modular convolution neural network is proposed to solve the problem of human pose estimation in static 2D images. A cascaded three-stage full convolutional network (FCN) can learn the non-linear mapping from image feature space to human pose space in an end-to-end way. In order to improve the accuracy of predicting joints, the method of multi-feature source fusion is adopted to improve the estimation process of the human body posture. The first two stages of the network focus on learning local image features and joints neighborhood pixel features, and these features are merged in the third stage of the network. Finally, the coordinates of human joints are obtained by regression of the merged features. In our experiments, using the strict PCP criteria on the full body pose dataset LSP, the average prediction accuracy of our method is 79.3%. In addition, using the PCKh standard on the upper body pose dataset FLIC, our method achieves an average prediction accuracy of 93% without additional training.
What problem does this paper attempt to address?