Abstract:3D GAN inversion aims to project a single image into the latent space of a 3D Generative Adversarial Network (GAN), thereby achieving 3D geometry reconstruction. While there exist encoders that achieve good results in 3D GAN inversion, they are predominantly built on EG3D, which specializes in synthesizing near-frontal views and is limiting in synthesizing comprehensive 3D scenes from diverse viewpoints. In contrast to existing approaches, we propose a novel framework built on PanoHead, which excels in synthesizing images from a 360-degree perspective. To achieve realistic 3D modeling of the input image, we introduce a dual encoder system tailored for high-fidelity reconstruction and realistic generation from different viewpoints. Accompanying this, we propose a stitching framework on the triplane domain to get the best predictions from both. To achieve seamless stitching, both encoders must output consistent results despite being specialized for different tasks. For this reason, we carefully train these encoders using specialized losses, including an adversarial loss based on our novel occlusion-aware triplane discriminator. Experiments reveal that our approach surpasses the existing encoder training methods qualitatively and quantitatively. Please visit the project page: <a class="link-external link-https" href="https://berkegokmen1.github.io/dual-enc-3d-gan-inv" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of reconstructing high - fidelity 3D head models from a single image. Specifically, the authors propose a new framework to achieve high - quality 3D geometric reconstruction by projecting a single image into the latent space of a 3D Generative Adversarial Network (GAN). Existing methods mainly rely on the EG3D framework, which performs well in synthesizing near - frontal views but has limitations when dealing with diverse views and comprehensive 3D scenes. To solve these problems, the authors introduce the following innovations: 1. **Use of PanoHead**: Different from the existing EG3D, the authors adopt the PanoHead framework, which can synthesize 360 - degree - view images, thus better handling diverse 3D scenes. 2. **Dual - encoder system**: In order to achieve high - fidelity 3D reconstruction and multi - view realism generation, the authors design a dual - encoder system. One encoder focuses on reconstructing the given view, and the other encoder focuses on generating high - quality invisible views. 3. **Tri - plane splicing mechanism**: In order to seamlessly combine the outputs of the two encoders, the authors propose a splicing mechanism in the tri - plane domain to ensure the consistency and coherence of the final result. 4. **Occlusion - aware tri - plane discriminator**: In order to improve the reconstruction quality and realism, the authors introduce a new occlusion - aware tri - plane discriminator, which is specifically used to train the encoders to produce consistent and complementary outputs. ### Main contributions of the paper - Proposed a dual - encoder system, combining the advantages of the two encoders, achieving high - fidelity input image reconstruction and multi - view realism generation. - Introduced an occlusion - aware tri - plane discriminator, improving the realism and consistency of the reconstruction. - Verified the effectiveness of the framework through extensive experiments, with both quantitative and qualitative results superior to existing methods. ### Formula representation The formulas involved in the paper are represented in Markdown format as follows: - Reconstruction loss function: \[ \arg \min_{E_1} L_{LPIPS}(I_{sv}^{final}, I)+L_2(I_{sv}^{final}, I)+L_{identity}(I_{sv}^{final}, I) \] - Adversarial loss function: \[ \arg \min_{E_2} \max_D L_{adv}(O_{\pi_R} T_{sv}^{final}, O_{\pi_R} T_{synth}) \] These formulas are respectively used to guide the training of the encoders, ensuring that their outputs are both faithful to the input image and can maintain realism under different views. ### Summary This paper solves the challenge of reconstructing high - fidelity 3D head models from a single image by introducing a dual - encoder system and an occlusion - aware tri - plane discriminator, and shows its potential in various application scenarios, such as film production, Augmented Reality (AR) and Virtual Reality (VR).

Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

Make Encoder Great Again in 3D GAN Inversion through Geometry and Occlusion-Aware Encoding

TriPlaneNet: An Encoder for EG3D Inversion

3D GAN Inversion with Pose Optimization

3D GAN Inversion with Facial Symmetry Prior

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

Progressive Learning of 3D Reconstruction Network from 2D GAN Data

InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars

Monocular 3D Object Reconstruction with GAN Inversion

PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$

Meta-Auxiliary Network for 3D GAN Inversion

Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

ReE3D: Boosting Novel View Synthesis for Monocular Images Using Residual Encoders

3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing

Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Learning Full-Head 3D GANs from a Single-View Portrait Dataset

Self-supervised single-view 3D point cloud reconstruction through GAN inversion

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

Reference-Based 3D-Aware Image Editing with Triplanes

In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing