Freestyle 3D-Aware Portrait Synthesis Based on Compositional Generative Priors

Tianxiang Ma,Kang Zhao,Jianxin Sun,Yingya Zhang,Jing Dong
DOI: https://doi.org/10.48550/arXiv.2306.15419
2023-12-24
Abstract:Efficiently generating a freestyle 3D portrait with high quality and 3D-consistency is a promising yet challenging task. The portrait styles generated by most existing methods are usually restricted by their 3D generators, which are learned in specific facial datasets, such as FFHQ. To get the diverse 3D portraits, one can build a large-scale multi-style database to retrain a 3D-aware generator, or use a off-the-shelf tool to do the style translation. However, the former is time-consuming due to data collection and training process, the latter may destroy the multi-view consistency. To tackle this problem, we propose a novel text-driven 3D-aware portrait synthesis framework that can generate out-of-distribution portrait styles. Specifically, for a given portrait style prompt, we first composite two generative priors, a 3D-aware GAN generator and a text-guided image editor, to quickly construct a few-shot stylized portrait set. Then we map the special style domain of this set to our proposed 3D latent feature generator and obtain a 3D representation containing the given style information. Finally we use a pre-trained 3D renderer to generate view-consistent stylized portraits from the 3D representation. Extensive experimental results show that our method is capable of synthesizing high-quality 3D portraits with specified styles in a few minutes, outperforming the state-of-the-art.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in efficiently generating high - quality and 3D - consistent free - style 3D portraits. Specifically, existing methods are usually limited by their 3D generators, which are trained on specific facial datasets (such as FFHQ), limiting the diversity of portrait styles generated. To obtain diverse 3D portraits, researchers can build a large - scale multi - style database to retrain 3D - aware generators or use off - the - shelf tools for style transfer. However, the former is time - consuming due to the data collection and training process, and the latter may break the multi - view consistency. Therefore, this paper proposes a novel text - driven 3D - aware portrait synthesis framework, aiming to solve the problem of how to generate free - style 3D portraits at low cost. ### Specific problems solved by the paper: 1. **Generating diverse 3D portraits**: Existing 3D portrait generation methods are often limited to specific dataset styles, resulting in a single - style generated portraits. The method proposed in this paper can generate diverse 3D portraits through text prompts, thus breaking this limitation. 2. **Maintaining multi - view consistency**: Maintaining multi - view consistency is a challenge when generating 3D portraits from different views. This paper effectively solves this problem by optimizing the inference process of Instruct - pix2pix and designing a 3D latent feature generator. 3. **Efficient generation**: Traditional generation methods either require a large amount of time and resources to collect and train data or break 3D consistency during the style transfer process. The method in this paper can generate high - quality 3D portraits within a few minutes while maintaining 3D consistency. ### Overview of the solution: 1. **Combining generation priors**: Utilize two pre - trained generation models - EG3D and Instruct - pix2pix to quickly build a small - sample multi - view portrait dataset with a given style. 2. **Optimizing Instruct - pix2pix inference**: By introducing new noise and enhanced prompts, optimize the inference process of Instruct - pix2pix to generate more stable and consistent stylized results from different views. 3. **3D latent feature generator**: Design a 3D latent feature generator to map the style information in the small - sample stylized portrait dataset to the 3D implicit representation. Through pre - training and fine - tuning, this generator can quickly generate high - quality 3D - consistent stylized portraits. ### Main contributions: - Proposed a 3D - aware portrait synthesis framework based on combined generation priors, which can generate free - style 3D portraits driven by text. - Designed a 3D latent feature generator, allowing for rapid fine - tuning to map out - of - distribution styles to 3D representations. - Compared with the baseline methods for stylized 3D portrait synthesis, the method in this paper has obvious advantages in performance and efficiency. Through the above methods, this paper successfully solves the problem of generating diverse, high - quality, and 3D - consistent free - style 3D portraits.