Abstract:We introduce an approach that accurately reconstructs 3D human poses and detailed 3D full-body geometric models from single images in realtime. The key idea of our approach is a novel end-to-end multi-task deep learning framework that uses single images to predict five outputs simultaneously: foreground segmentation mask, 2D joints positions, semantic body partitions, 3D part orientations and uv coordinates (uv map). The multi-task network architecture not only generates more visual cues for reconstruction, but also makes each individual prediction more accurate. The CNN regressor is further combined with an optimization based algorithm for accurate kinematic pose reconstruction and full-body shape modeling. We show that the realtime reconstruction reaches accurate fitting that has not been seen before, especially for wild images. We demonstrate the results of our realtime 3D pose and human body reconstruction system on various challenging in-the-wild videos. We show the system advances the frontier of 3D human body and pose reconstruction from single images by quantitative evaluations and comparisons with state-of-the-art methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reconstruct the 3D pose of the human body in an arbitrary pose and a detailed 3D full - body geometric model in real - time from a single RGB image. Specifically, the paper proposes a novel end - to - end multi - task deep - learning framework that can simultaneously predict five outputs from a single image: foreground segmentation mask, 2D joint positions, semantic body partitions, 3D part orientations and uv coordinates (uvmap). This method not only generates more visual cues for reconstruction, but also makes each individual prediction more accurate. In addition, the method in the paper combines an optimization - based algorithm to achieve accurate dynamic pose reconstruction and full - body shape modeling, thus achieving unprecedented accuracy in real - time processing, especially when dealing with images in natural environments. The key innovation points of the paper are as follows: 1. **Real - time performance**: Through a specially designed neural network, it can regress multiple human body structure features in real - time from a single image, and further feed the network output to an efficient 3D human body pose and body geometry fitting optimizer to achieve real - time reconstruction performance. 2. **Fully automatic and robust**: The rich regression output per frame enables reconstruction from a single image without relying on any pre - initialization state, which makes the reconstruction in the video no longer troubled by re - initialization. The system is also highly robust to illumination changes and clothing diversity. 3. **Accuracy**: The reconstruction quality of the real - time system in natural images is even more accurate than most offline or video - based methods. This achievement is mainly attributed to three aspects: (1) The novel multi - task deep - learning network predicts rich features that promote each other; (2) All visual features obtained by the deep - learning network are efficiently and seamlessly integrated into the reconstruction process; (3) Expand the existing training data set with newly collected data. Through these innovations, the paper significantly advances the technological frontier of real - time reconstruction of 3D human body poses and detailed geometric models from a single image, and provides a comparison with alternative solutions.

Deep3DPose: Realtime Reconstruction of Arbitrarily Posed Human Bodies from Single RGB Images

High-precision Human Body Acquisition Via Multi-View Binocular Stereopsis

3D Human Reconstruction from A Single Depth Image

Dynamic Human Body Reconstruction and Motion Tracking with Low-Cost Depth Cameras

Real-time human pose recognition in parts from single depth images

3D real-time human reconstruction with a single RGBD camera

Ihuman3d: Intelligent Human Body 3D Reconstruction Using a Single Flying Camera.

LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation

DeepHuman: 3D Human Reconstruction from a Single Image

Learning Pose Controllable Human Reconstruction with Dynamic Implicit Fields from a Single Image

Deep Textured 3D Reconstruction of Human Bodies

Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS

SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

3D Human Pose and Shape Estimation with Dense Correspondence from a Single Depth Image

Coherent Reconstruction of Multiple Humans from a Single Image

Motion Capture Research: 3D Human Pose Recovery Based on RGB Video Sequences

Monocular Real-time Full Body Capture with Inter-part Correlations

Reconstructing 3D Human Pose from RGB-D Data with Occlusions

LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation