Abstract:Generating multi-view human images from a single view is a complex and significant challenge. Although recent advancements in multi-view object generation have shown impressive results with diffusion models, novel view synthesis for humans remains constrained by the limited availability of 3D human datasets. Consequently, many existing models struggle to produce realistic human body shapes or capture fine-grained facial details accurately. To address these issues, we propose an innovative framework that leverages transferred body and facial representations for multi-view human synthesis. Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation, aiming to extend the 2D knowledge of the single-view model to a multi-view diffusion model. Additionally, to enhance the model's detail restoration capability, we integrate transferred multimodal facial features into our trained human diffusion model. Experimental evaluations on benchmark datasets demonstrate that our approach outperforms the current state-of-the-art methods, achieving superior performance in multi-view human synthesis.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Generating multi - view human body images from a single view is a complex and challenging task. Although significant progress has been made recently in multi - view object generation, the multi - view generation of human bodies is still limited by the limited availability of 3D human body datasets. Therefore, many existing models have difficulty in generating realistic human body shapes or accurately capturing facial details. Specifically, the paper aims to solve the following two main problems: 1. **How to generate high - quality results with a limited - scale 3D human body dataset**: Due to the scarcity of large - scale multi - view human body data, existing object - based methods may lead to incomplete or distorted occluded areas when directly applied to human bodies. 2. **How to accurately capture fine facial details**: Existing methods are usually unable to focus on facial information, resulting in blurry faces and distorted expressions in the results. To solve these problems, the author proposes an innovative framework to achieve multi - view human body synthesis by learning transferred body and facial representations. Specific methods include: - Using a pre - trained single - view human body model to develop multi - view body representations to extend the 2D knowledge of the single - view model to the multi - view diffusion model. - Integrating the transferred multi - modal facial features into the trained human body diffusion model to enhance the model's detail recovery ability. ### Method Overview The paper proposes a two - stage learning framework: 1. **Transferring body representations**: Use a pre - trained single - view human body model to learn transferred multi - view body representations. By transferring the weights of the single - view model to the multi - view model and combining the normal maps generated by the SMPL model to model the rough body shape and pose. 2. **Transferring facial representations**: Integrate 2D and 3D facial features to provide structurally accurate and identity - preserving information, thereby enhancing facial representations. Specifically, use 3D priors (such as 3DMM) to provide robust facial structures, while 2D priors provide identity information in the semantic space. Through this method, the experimental results of the paper on the THuman2.1 and 2K2K datasets show that this method achieves state - of - the - art performance in the multi - view human body synthesis task, especially in the restoration of facial details. ### Experimental Results The experimental part shows the quantitative and qualitative comparison results of this method on the THuman2.1 and 2K2K datasets. Quantitative metrics (PSNR, SSIM, LPIPS) indicate that this method outperforms other baseline methods on all evaluation metrics. Qualitative results also show that this method can generate more realistic facial details, especially in the side view. ### Summary This paper successfully solves the challenges faced in multi - view human body generation, especially the problems of limited 3D data and facial detail restoration, by introducing transferred body and facial representations.

Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations

Ivs-Net: Learning Human View Synthesis from Internet Videos

Multi-view Shape Generation for a 3D Human-like Body

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

A Robust Multi‐View System for High‐Fidelity Human Body Shape Reconstruction

AdaptiveFusion: Adaptive Multi-Modal Multi-View Fusion for 3D Human Body Reconstruction

Multi-View Human Mesh Reconstruction via Direction-Aware Feature Fusion

High-precision Human Body Acquisition Via Multi-View Binocular Stereopsis

Multi-View Face Image Synthesis Using Factorization Model

View Extrapolation of Human Body from a Single Image

From 2D Images to 3D Model:Weakly Supervised Multi-View Face Reconstruction with Deep Fusion

Multi-view Human Body Mesh Translator

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic 3D Human Generation

View Synthesis from Multi-View RGB Data Using Multilayered Representation and Volumetric Estimation

Single Image, Any Face: Generalisable 3D Face Generation

Human Mesh Recovery from Arbitrary Multi-view Images

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction

ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis

Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans

Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM