Abstract:Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model. Existing methods face difficulties including a) the varying body proportions captured by in-the-wild human images; b) diverse personal belongings within the shot; and c) ambiguities in human postures and inconsistency in human textures. In addition, the scarcity of high-quality human data intensifies the challenge. To address these problems, we propose a Generalizable image-to-3D huMAN reconstruction framework, dubbed GeneMAN, building upon a comprehensive multi-source collection of high-quality human data, including 3D scans, multi-view videos, single photos, and our generated synthetic human data. GeneMAN encompasses three key modules. 1) Without relying on parametric human models (e.g., SMPL), GeneMAN first trains a human-specific text-to-image diffusion model and a view-conditioned diffusion model, serving as GeneMAN 2D human prior and 3D human prior for reconstruction, respectively. 2) With the help of the pretrained human prior models, the Geometry Initialization-&-Sculpting pipeline is leveraged to recover high-quality 3D human geometry given a single image. 3) To achieve high-fidelity 3D human textures, GeneMAN employs the Multi-Space Texture Refinement pipeline, consecutively refining textures in the latent and the pixel spaces. Extensive experimental results demonstrate that GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods. Notably, GeneMAN could reveal much better generalizability in dealing with in-the-wild images, often yielding high-quality 3D human models in natural poses with common items, regardless of the body proportions in the input images.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reconstruct high - quality 3D human body models from a single natural - scene image. Existing methods face the following challenges when dealing with natural - scene images: 1. **Changing body proportions**: Photos in natural scenes may contain full - body, half - body or close - up shots, while existing methods mainly focus on full - body reconstruction. 2. **Human bodies with carried items**: In daily photography, people often hold items in their hands, stand on objects or wear various accessories, and these factors will seriously affect the reconstruction quality. 3. **Reconstruction of natural postures and textures**: Due to the lack of widely applicable human body geometry and texture models, existing methods are difficult to reconstruct credible geometric structures and consistent textures from real - world images. 4. **Scarcity of high - quality human body data**: The lack of high - quality human body data further exacerbates the difficulty of this problem. To address these challenges, the paper proposes a general single - view - to - 3D human body reconstruction framework - GeneMAN. Based on multi - source high - quality human body data sets, GeneMAN trains human - body - specific prior models to generate high - quality 3D human body models from a single natural - scene image. Specifically, GeneMAN includes the following key modules: 1. **Geometry initialization and carving**: Use NeRF for initial geometric prediction, and then use DMTet for high - resolution refinement to add geometric details. 2. **Multi - space texture optimization**: First generate rough textures in the latent space, and then optimize in the pixel space to obtain detailed 3D textures. Through these modules, GeneMAN can generate high - quality 3D human body models from a single natural - scene image, regardless of the body proportions, postures, clothing or personal items of the human body in the input image. Experimental results show that GeneMAN has stronger generalization ability and higher generation quality when dealing with natural - scene images.

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

Personalized 3D Mannequin Reconstruction Based on 3D Scanning

3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Single Image, Any Face: Generalisable 3D Face Generation

3D Human Reconstruction from A Single Depth Image

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

Image-Guided Human Reconstruction via Multi-Scale Graph Transformation Networks

Human Mesh Reconstruction with Generative Adversarial Networks from Single RGB Images

Enhanced Multi-Scale Attention-Driven 3D Human Reconstruction from Single Image

3D-Aware Semantic-Guided Generative Model for Human Synthesis.

Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors

CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization

HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion

En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

DeepHuman: 3D Human Reconstruction from a Single Image

Generalizing Monocular 3d Human Pose Estimation In The Wild

CapHuman: Capture Your Moments in Parallel Universes

SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation

Human Bas-Relief Generation from A Single Photograph