Abstract:Synthesis and reconstruction of 3D human head has gained increasing interests in computer vision and computer graphics recently. Existing state-of-the-art 3D generative adversarial networks (GANs) for 3D human head synthesis are either limited to near-frontal views or hard to preserve 3D consistency in large view angles. We propose PanoHead, the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in $360^\circ$ with diverse appearance and detailed geometry using only in-the-wild unstructured images for training. At its core, we lift up the representation power of recent 3D GANs and bridge the data alignment gap when training from in-the-wild images with widely distributed views. Specifically, we propose a novel two-stage self-adaptive image alignment for robust 3D GAN training. We further introduce a tri-grid neural volume representation that effectively addresses front-face and back-head feature entanglement rooted in the widely-adopted tri-plane formulation. Our method instills prior knowledge of 2D image segmentation in adversarial learning of 3D neural scene structures, enabling compositable head synthesis in diverse backgrounds. Benefiting from these designs, our method significantly outperforms previous 3D GANs, generating high-quality 3D heads with accurate geometry and diverse appearances, even with long wavy and afro hairstyles, renderable from arbitrary poses. Furthermore, we show that our system can reconstruct full 3D heads from single input images for personalized realistic 3D avatars.
What problem does this paper attempt to address?
The paper mainly aims to address the following issues:
1. **Full 3D Head Synthesis**: Existing 3D Generative Adversarial Networks (GANs) perform well for head synthesis from near-frontal views but struggle to maintain 3D consistency with larger view changes. Therefore, the researchers propose a novel 3D-aware generative model called PanoHead, which can achieve high-quality, view-consistent full head image synthesis, covering 360-degree views, and can handle diverse appearances and detailed geometric features.
2. **Background and Foreground Separation**: In traditional methods, the foreground (head) and background are easily confused, leading to issues when synthesizing images from large-angle views. To solve this problem, the researchers introduced a foreground-aware tri-discriminator to separate foreground head modeling from background synthesis.
3. **Improvement of 3D Representation**: The three-plane representation method has projection ambiguities under 360-degree views, leading to phenomena like "mirror faces." The paper proposes a new 3D tri-grid volume representation to address this issue, which can improve expressive power while maintaining efficiency.
4. **Camera Alignment Challenges**: For rear head images in the wild, obtaining accurate camera extrinsic parameters is extremely difficult, and there is an alignment gap between these images and frontal images, leading to noisy appearances and suboptimal head geometry. The researchers propose a two-stage alignment scheme and a camera adaptive module to effectively tackle these challenges.
In summary, the main contribution of this paper is the proposal of a 3D-aware GAN framework called PanoHead, which can be trained from unstructured images in the wild and achieve high-fidelity full head image synthesis, including detailed geometric structures, and can render from any 360-degree view. Additionally, the paper introduces new techniques for separating foreground and background, improved 3D representation methods, and effective image alignment strategies.