Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations

Yu Feng,Shunsi Zhang,Jian Shu,Hanfeng Zhao,Guoliang Pang,Chi Zhang,Hao Wang
2024-12-04
Abstract:Generating multi-view human images from a single view is a complex and significant challenge. Although recent advancements in multi-view object generation have shown impressive results with diffusion models, novel view synthesis for humans remains constrained by the limited availability of 3D human datasets. Consequently, many existing models struggle to produce realistic human body shapes or capture fine-grained facial details accurately. To address these issues, we propose an innovative framework that leverages transferred body and facial representations for multi-view human synthesis. Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation, aiming to extend the 2D knowledge of the single-view model to a multi-view diffusion model. Additionally, to enhance the model's detail restoration capability, we integrate transferred multimodal facial features into our trained human diffusion model. Experimental evaluations on benchmark datasets demonstrate that our approach outperforms the current state-of-the-art methods, achieving superior performance in multi-view human synthesis.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Generating multi - view human body images from a single view is a complex and challenging task. Although significant progress has been made recently in multi - view object generation, the multi - view generation of human bodies is still limited by the limited availability of 3D human body datasets. Therefore, many existing models have difficulty in generating realistic human body shapes or accurately capturing facial details. Specifically, the paper aims to solve the following two main problems: 1. **How to generate high - quality results with a limited - scale 3D human body dataset**: Due to the scarcity of large - scale multi - view human body data, existing object - based methods may lead to incomplete or distorted occluded areas when directly applied to human bodies. 2. **How to accurately capture fine facial details**: Existing methods are usually unable to focus on facial information, resulting in blurry faces and distorted expressions in the results. To solve these problems, the author proposes an innovative framework to achieve multi - view human body synthesis by learning transferred body and facial representations. Specific methods include: - Using a pre - trained single - view human body model to develop multi - view body representations to extend the 2D knowledge of the single - view model to the multi - view diffusion model. - Integrating the transferred multi - modal facial features into the trained human body diffusion model to enhance the model's detail recovery ability. ### Method Overview The paper proposes a two - stage learning framework: 1. **Transferring body representations**: Use a pre - trained single - view human body model to learn transferred multi - view body representations. By transferring the weights of the single - view model to the multi - view model and combining the normal maps generated by the SMPL model to model the rough body shape and pose. 2. **Transferring facial representations**: Integrate 2D and 3D facial features to provide structurally accurate and identity - preserving information, thereby enhancing facial representations. Specifically, use 3D priors (such as 3DMM) to provide robust facial structures, while 2D priors provide identity information in the semantic space. Through this method, the experimental results of the paper on the THuman2.1 and 2K2K datasets show that this method achieves state - of - the - art performance in the multi - view human body synthesis task, especially in the restoration of facial details. ### Experimental Results The experimental part shows the quantitative and qualitative comparison results of this method on the THuman2.1 and 2K2K datasets. Quantitative metrics (PSNR, SSIM, LPIPS) indicate that this method outperforms other baseline methods on all evaluation metrics. Qualitative results also show that this method can generate more realistic facial details, especially in the side view. ### Summary This paper successfully solves the challenges faced in multi - view human body generation, especially the problems of limited 3D data and facial detail restoration, by introducing transferred body and facial representations.