Abstract:The pixel-aligned implicit functions (IFs) enable the reconstruction of 3D human with complete and detailed clothing from a single RGB image. To enhance robustness for poses, existing work introduce the parametric body model as prior, but this limits the recovery of the geometry details and makes it challenging to handle loose clothing. Our goal is to reconstruct both clothing and pose that highly align with the input image, even in cases of peculiar poses and complex clothing. To achieve this, we propose a multi-scale features-based implicit method, called RICH, which combines the flexibility of implicit function and the powerful prior of parametric body model. RICH introduces a 3D human body model as prior knowledge and adopts local feature to constrain human body generation. Furthermore, RICH employs a pretrained image encoder to extract global pixel-aligned feature, which contributes to high-precision and complete reconstruction of clothing geometry and of the external appearance such as hair and accessories. Besides, by establishing connections with the joints of the body model, RICH utilizes an attention mechanism to construct relative spatial feature, thereby increasing the robustness for poses. Finally, RICH takes as input local, relative, and global feature to IF to query occupancy and the clothed human is represented by the 0.5 iso-surface of the 3D occupancy field. Quantitative and qualitative evaluation on the THuman2.0 and CAPE datasets shows that RICH outperforms the state-of-the-art methods. In particular, RICH demonstrates strong generalization ability on in-the-wild images, even under the scenarios of challenging poses and complex clothing. The code and supplementary material will be available at https://github.com/lyk412/RICH .

Visual and Spatial Context Fusion for Implicit Human Reconstruction

Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction

Disambiguating Monocular Reconstruction of 3D Clothed Human with Spatial-Temporal Transformer

3D Human Reconstruction from A Single Depth Image

UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation

Implicit 3D Human Reconstruction Guided by Parametric Models and Normal Maps

SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

Multi‐Level Implicit Function for Detailed Human Reconstruction by Relaxing SMPL Constraints

Pixel2ISDF: Implicit Signed Distance Fields based Human Body Model from Multi-view and Multi-pose Images

Learning Pose Controllable Human Reconstruction with Dynamic Implicit Fields from a Single Image

SeSDF: Self-evolved Signed Distance Field for Implicit 3D Clothed Human Reconstruction

RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues.

AdaptiveFusion: Adaptive Multi-Modal Multi-View Fusion for 3D Human Body Reconstruction

Deep Implicit Templates for 3D Shape Representation

Implicit Neural Representations With Structured Latent Codes for Human Body Modeling

Local Deep Implicit Functions for 3D Shape

Detailed 3D Human Body Reconstruction from Multi-view Images Combining Voxel Super-Resolution and Learned Implicit Representation

LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction

High-Resolution Volumetric Reconstruction for Clothed Humans