Abstract:Convolutional Neural Network (CNN) has brought tremendous improvements in estimating 3D human pose from a monocular RGB image. However, the task of 3D human pose estimation still remains extremely challenging, especially when the task is geared towards estimating the depth of human body parts. Different from 2D human pose estimation, which focuses on the fusion of spatial information and context information, depth estimation demands more context information. Inspired by this, we build a Context-Aware Network (CAN) which can fully explore the context information to discover the underlying relationships among different body parts. The key ingredient of our network is High-Level Depth Estimation Module (HLDEM) designed to extract context information effectively. Additionally, multi-scale supervision is introduced in our network to extract context information at different scales. Experimental results show that our network achieves competitive performance compared with state-of-the-art methods on Human3.6M dataset.

Context-Aware Network for 3D Human Pose Estimation from Monocular RGB Image.