Abstract:Due to the inherent limitations of a single viewpoint, reconstructing 3D human meshes from a single image has long been a challenging task. While deep learning networks enable us to approximate the shape of unseen sides, capturing the texture details of the non-visible side remains difficult with just one image. Traditional methods utilize Generative Adversarial Networks (GANs) to predict the normal maps of the non-visible side, thereby inferring detailed textures and wrinkles on the model's surface. However, we have identified challenges with existing normal prediction networks when dealing with complex scenes, such as a lack of focus on local features and insufficient modeling of spatial relationships.To address these challenges, we introduce EMAR—Enhanced Multi-scale Attention-Driven Single-Image 3D Human Reconstruction. This approach incorporates a novel Enhanced Multi-Scale Attention (EMSA) mechanism, which excels at capturing intricate features and global relationships in complex scenes. EMSA surpasses traditional single-scale attention mechanisms by adaptively adjusting the weights between features, enabling the network to more effectively leverage information across various scales. Furthermore, we have improved the feature fusion method to better integrate representations from different scales. This enhanced feature fusion allows the network to more comprehensively understand both fine details and global structures within the image. Finally, we have designed a hybrid loss function tailored to the introduced attention mechanism and feature fusion method, optimizing the network's training process and enhancing the quality of reconstruction results. Our network demonstrates significant improvements in performance for 3D human model reconstruction. Experimental results show that our method exhibits greater robustness to challenging poses compared to traditional single-scale approaches.

Topology-Preserved Human Reconstruction with Details

3D Human Reconstruction from A Single Depth Image

Image-Guided Human Reconstruction via Multi-Scale Graph Transformation Networks

SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video

Detailed Human Shape Estimation From A Single Image By Hierarchical Mesh Deformation

LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies

DeepHuman: 3D Human Reconstruction from a Single Image

Deep Mesh Reconstruction from Single RGB Images via Topology Modification Networks

FastHuman: Reconstructing High-Quality Clothed Human in Minutes

Enhanced Multi-Scale Attention-Driven 3D Human Reconstruction from Single Image

Geometry-aware Two-scale PIFu Representation for Human Reconstruction

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

Toward Clothing Personalized Customization Multi-Perspective Silhouettes 3D Human Body Rapid Reconstruction

Dynamic Human Body Reconstruction and Motion Tracking with Low-Cost Depth Cameras

Shape-from-Mask: A Deep Learning Based Human Body Shape Reconstruction from Binary Mask Images

DressRecon: Freeform 4D Human Reconstruction from Monocular Video

A Robust Multi‐View System for High‐Fidelity Human Body Shape Reconstruction

Detailed 3D Human Body Reconstruction from Multi-view Images Combining Voxel Super-Resolution and Learned Implicit Representation

Single-view 3D Body and Cloth Reconstruction under Complex Poses

A Local Correspondence-aware Hybrid CNN-GCN Model for Single-image Human Body Reconstruction

Deep3DPose: Realtime Reconstruction of Arbitrarily Posed Human Bodies from Single RGB Images