Abstract:Efficiently generating a freestyle 3D portrait with high quality and 3D-consistency is a promising yet challenging task. The portrait styles generated by most existing methods are usually restricted by their 3D generators, which are learned in specific facial datasets, such as FFHQ. To get the diverse 3D portraits, one can build a large-scale multi-style database to retrain a 3D-aware generator, or use a off-the-shelf tool to do the style translation. However, the former is time-consuming due to data collection and training process, the latter may destroy the multi-view consistency. To tackle this problem, we propose a novel text-driven 3D-aware portrait synthesis framework that can generate out-of-distribution portrait styles. Specifically, for a given portrait style prompt, we first composite two generative priors, a 3D-aware GAN generator and a text-guided image editor, to quickly construct a few-shot stylized portrait set. Then we map the special style domain of this set to our proposed 3D latent feature generator and obtain a 3D representation containing the given style information. Finally we use a pre-trained 3D renderer to generate view-consistent stylized portraits from the 3D representation. Extensive experimental results show that our method is capable of synthesizing high-quality 3D portraits with specified styles in a few minutes, outperforming the state-of-the-art.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges in efficiently generating high - quality and 3D - consistent free - style 3D portraits. Specifically, existing methods are usually limited by their 3D generators, which are trained on specific facial datasets (such as FFHQ), limiting the diversity of portrait styles generated. To obtain diverse 3D portraits, researchers can build a large - scale multi - style database to retrain 3D - aware generators or use off - the - shelf tools for style transfer. However, the former is time - consuming due to the data collection and training process, and the latter may break the multi - view consistency. Therefore, this paper proposes a novel text - driven 3D - aware portrait synthesis framework, aiming to solve the problem of how to generate free - style 3D portraits at low cost. ### Specific problems solved by the paper: 1. **Generating diverse 3D portraits**: Existing 3D portrait generation methods are often limited to specific dataset styles, resulting in a single - style generated portraits. The method proposed in this paper can generate diverse 3D portraits through text prompts, thus breaking this limitation. 2. **Maintaining multi - view consistency**: Maintaining multi - view consistency is a challenge when generating 3D portraits from different views. This paper effectively solves this problem by optimizing the inference process of Instruct - pix2pix and designing a 3D latent feature generator. 3. **Efficient generation**: Traditional generation methods either require a large amount of time and resources to collect and train data or break 3D consistency during the style transfer process. The method in this paper can generate high - quality 3D portraits within a few minutes while maintaining 3D consistency. ### Overview of the solution: 1. **Combining generation priors**: Utilize two pre - trained generation models - EG3D and Instruct - pix2pix to quickly build a small - sample multi - view portrait dataset with a given style. 2. **Optimizing Instruct - pix2pix inference**: By introducing new noise and enhanced prompts, optimize the inference process of Instruct - pix2pix to generate more stable and consistent stylized results from different views. 3. **3D latent feature generator**: Design a 3D latent feature generator to map the style information in the small - sample stylized portrait dataset to the 3D implicit representation. Through pre - training and fine - tuning, this generator can quickly generate high - quality 3D - consistent stylized portraits. ### Main contributions: - Proposed a 3D - aware portrait synthesis framework based on combined generation priors, which can generate free - style 3D portraits driven by text. - Designed a 3D latent feature generator, allowing for rapid fine - tuning to map out - of - distribution styles to 3D representations. - Compared with the baseline methods for stylized 3D portrait synthesis, the method in this paper has obvious advantages in performance and efficiency. Through the above methods, this paper successfully solves the problem of generating diverse, high - quality, and 3D - consistent free - style 3D portraits.

Freestyle 3D-Aware Portrait Synthesis Based on Compositional Generative Priors

Feature-Based Automatic Portrait Generation System

HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks

Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

3D-SSGAN: Lifting 2D Semantics for 3D-Aware Compositional Portrait Synthesis

AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation

IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis

Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution

Fast 3D Stylized Gaussian Portrait Generation From a Single Image With Style Aligned Sampling Loss

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

Learning Full-Head 3D GANs from a Single-View Portrait Dataset

Make Your Brief Stroke Real and Stereoscopic: 3D-Aware Simplified Sketch to Portrait Generation

Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

SofGAN: A Portrait Image Generator with Dynamic Styling

Generating Animatable 3D Cartoon Faces from Single Portraits

ZePo: Zero-Shot Portrait Stylization with Faster Sampling

A Free Viewpoint Portrait Generator with Dynamic Styling.

Portrait Video Editing Empowered by Multimodal Generative Priors