Abstract:Reconstructing a 3D human body mesh from a monocular image is a challenging inverse problem because of occlusion and complicated human articulations. Recent deep learning-based methods have made significant progress in single-image human reconstruction. Most of these works are either model-based methods, which estimate the sparse parameters of a statistical human body model, or model-free methods, which directly recover the vertices' coordinates of a predefined mesh. However, model-based methods always suffer detail losses due to the limited parameter space, and model-free methods are hard to directly recover satisfactory results from images due to the use of a shared global feature for all vertices and the domain gap between 2D regular images and 3D irregular meshes. To resolve these issues, we propose a hybrid model, which combines the advantages of both model-based approach and model-free approach to estimate a 3D human mesh in a coarse-to-fine manner. Initially, we utilize a convolutional neural network (CNN) to estimate the parameters of a Skinned Multi-Person Linear Model (SMPL), which allows us to generate a coarse human mesh. After that, the vertex coordinates of the coarse human mesh are further refined by a graph convolutional neural network (GCN). Unlike previous GCN-based methods, whose vertex coordinates are recovered from a shared global feature, we propose a LOcal CorRespondence-Aware (LOCRA) module to extract local special features for each vertex. To make the local features related to the human pose, we also add a keypoint-related loss to supervise the training process of the LOCRA module. Experiments demonstrate that our hybrid model with the LOCRA module outperforms existing methods on multiple public benchmarks. Our code will be publicly available.

MH‐HMR: Human mesh recovery from monocular images via multi‐hypothesis learning

In-Hand 3D Object Reconstruction from a Monocular RGB Video

Recovering 3D Human Mesh from Monocular Images: A Survey

Human Mesh Recovery from Arbitrary Multi-view Images

End-to-end Recovery of Human Shape and Pose

PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds

Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Multi-view Human Body Mesh Translator

MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose

Synthetic Training for Monocular Human Mesh Recovery

W-HMR: Monocular Human Mesh Recovery in World Space with Weak-Supervised Calibration

Self-supervised 3D Human Mesh Recovery from Noisy Point Clouds

Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view

Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses

Learning Human Mesh Recovery in 3D Scenes

Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction

A Local Correspondence-aware Hybrid CNN-GCN Model for Single-image Human Body Reconstruction