Abstract:3D GAN inversion aims to project a single image into the latent space of a 3D Generative Adversarial Network (GAN), thereby achieving 3D geometry reconstruction. While there exist encoders that achieve good results in 3D GAN inversion, they are predominantly built on EG3D, which specializes in synthesizing near-frontal views and is limiting in synthesizing comprehensive 3D scenes from diverse viewpoints. In contrast to existing approaches, we propose a novel framework built on PanoHead, which excels in synthesizing images from a 360-degree perspective. To achieve realistic 3D modeling of the input image, we introduce a dual encoder system tailored for high-fidelity reconstruction and realistic generation from different viewpoints. Accompanying this, we propose a stitching framework on the triplane domain to get the best predictions from both. To achieve seamless stitching, both encoders must output consistent results despite being specialized for different tasks. For this reason, we carefully train these encoders using specialized losses, including an adversarial loss based on our novel occlusion-aware triplane discriminator. Experiments reveal that our approach surpasses the existing encoder training methods qualitatively and quantitatively. Please visit the project page: <a class="link-external link-https" href="https://berkegokmen1.github.io/dual-enc-3d-gan-inv" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Computational Geometry,Graphics,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of reconstructing high - fidelity 3D head models from a single image. Specifically, the authors propose a new framework to achieve high - quality 3D geometric reconstruction by projecting a single image into the latent space of a 3D Generative Adversarial Network (GAN). Existing methods mainly rely on the EG3D framework, which performs well in synthesizing near - frontal views but has limitations when dealing with diverse views and comprehensive 3D scenes.
To solve these problems, the authors introduce the following innovations:
1. **Use of PanoHead**: Different from the existing EG3D, the authors adopt the PanoHead framework, which can synthesize 360 - degree - view images, thus better handling diverse 3D scenes.
2. **Dual - encoder system**: In order to achieve high - fidelity 3D reconstruction and multi - view realism generation, the authors design a dual - encoder system. One encoder focuses on reconstructing the given view, and the other encoder focuses on generating high - quality invisible views.
3. **Tri - plane splicing mechanism**: In order to seamlessly combine the outputs of the two encoders, the authors propose a splicing mechanism in the tri - plane domain to ensure the consistency and coherence of the final result.
4. **Occlusion - aware tri - plane discriminator**: In order to improve the reconstruction quality and realism, the authors introduce a new occlusion - aware tri - plane discriminator, which is specifically used to train the encoders to produce consistent and complementary outputs.
### Main contributions of the paper
- Proposed a dual - encoder system, combining the advantages of the two encoders, achieving high - fidelity input image reconstruction and multi - view realism generation.
- Introduced an occlusion - aware tri - plane discriminator, improving the realism and consistency of the reconstruction.
- Verified the effectiveness of the framework through extensive experiments, with both quantitative and qualitative results superior to existing methods.
### Formula representation
The formulas involved in the paper are represented in Markdown format as follows:
- Reconstruction loss function:
\[
\arg \min_{E_1} L_{LPIPS}(I_{sv}^{final}, I)+L_2(I_{sv}^{final}, I)+L_{identity}(I_{sv}^{final}, I)
\]
- Adversarial loss function:
\[
\arg \min_{E_2} \max_D L_{adv}(O_{\pi_R} T_{sv}^{final}, O_{\pi_R} T_{synth})
\]
These formulas are respectively used to guide the training of the encoders, ensuring that their outputs are both faithful to the input image and can maintain realism under different views.
### Summary
This paper solves the challenge of reconstructing high - fidelity 3D head models from a single image by introducing a dual - encoder system and an occlusion - aware tri - plane discriminator, and shows its potential in various application scenarios, such as film production, Augmented Reality (AR) and Virtual Reality (VR).