Abstract:Inferring 3D object structures from a single image is an ill-posed task due to depth ambiguity and occlusion. Typical resolutions in the literature include leveraging 2D or 3D ground truth for supervised learning, as well as imposing hand-crafted symmetry priors or using an implicit representation to hallucinate novel viewpoints for unsupervised methods. In this work, we propose a general adversarial learning framework for solving Unsupervised 2D to Explicit 3D Style Transfer (UE3DST). Specifically, we merge two architectures: the unsupervised explicit 3D reconstruction network of Wu et al.\ and the Generative Adversarial Network (GAN) named StarGAN-v2. We experiment across three facial datasets (Basel Face Model, 3DFAW and CelebA-HQ) and show that our solution is able to outperform well established solutions such as DepthNet in 3D reconstruction and Pix2NeRF in conditional style transfer, while we also justify the individual contributions of our model components via ablation. In contrast to the aforementioned baselines, our scheme produces features for explicit 3D rendering, which can be manipulated and utilized in downstream tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the unsupervised explicit 3D face reconstruction and style transfer (Unsupervised 2D to Explicit 3D Style Transfer, UE3DST) from a single image. Specifically, the paper proposes a general adversarial learning framework, aiming to predict the features of the 3D rendering process (such as albedo, depth, shadow, and surface normal) from a single 2D image, thereby achieving the reconstruction of 3D objects and being able to synthesize images with new styles, including shape and appearance changes. This task is usually solved in the existing literature by using 2D or 3D real - data for supervised learning, or by applying hand - made symmetry priors or using implicit representations to infer new viewpoints for unsupervised methods. However, these methods either rely on explicit 3D representations or require supervision, while the method in this paper completes explicit 3D reconstruction and style transfer under unsupervised conditions, which is a more challenging task. The main contributions of the paper are as follows: 1. **Proposing a framework that combines an unsupervised explicit 3D reconstruction network and a generative adversarial network (GAN)**: This framework can recover the 3D structure from a single 2D image without relying on labeled data and can change the style of 3D objects. 2. **Experimental verification of the effectiveness of the method on multiple facial datasets**: The authors conducted experiments on three datasets, namely Basel Face Model, 3DFAW, and CelebA - HQ. The results show that the proposed method outperforms existing solutions such as DepthNet and Pix2NeRF in both 3D reconstruction and conditional style transfer. 3. **Evaluating the contributions of each component of the model through ablation studies**: The authors further verified the effectiveness of each part of the model through ablation studies, ensuring the overall performance of the model. In summary, this paper aims to solve the problem of unsupervised explicit 3D reconstruction and style transfer from a single image. By proposing an adversarial learning framework that combines unsupervised 3D reconstruction and style transfer, it achieves superior performance on multiple datasets.

Unsupervised Style-based Explicit 3D Face Reconstruction from Single Image

SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

Realistic Face Reenactment Via Self-Supervised Disentangling of Identity and Pose

2D GANs Meet Unsupervised Single-view 3D Reconstruction

Feature Sharing Attention 3D Face Reconstruction with Unsupervised Learning from In-the-Wild Photo Collection

Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

3D-Mask-GAN:Unsupervised Single-View 3D Object Reconstruction

PR3D: Precise and realistic 3D face reconstruction from a single image

Unsupervised 3D Reconstruction from a Single Image via Adversarial Learning

Face Denoising and 3D Reconstruction from A Single Depth Image

Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set

3D-GANTex: 3D Face Reconstruction with StyleGAN3-based Multi-View Images and 3DDFA based Mesh Generation

Advanced 3D Face Reconstruction from Single 2D Images Using Enhanced Adversarial Neural Networks and Graph Neural Networks

Is telephone assessment a valid tool in rehabilitation research and practice?

3D Face Arbitrary Style Transfer

3D Face Style Transfer with a Hybrid Solution of NeRF and Mesh Rasterization

AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

A Self-Supervised Bootstrap Method for Single-Image 3D Face Reconstruction

3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation

Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

3D Face Reconstruction with Geometry Details from a Single Image