Unsupervised Style-based Explicit 3D Face Reconstruction from Single Image

Heng Yu,Zoltan A. Milacski,Laszlo A. Jeni
2023-04-25
Abstract:Inferring 3D object structures from a single image is an ill-posed task due to depth ambiguity and occlusion. Typical resolutions in the literature include leveraging 2D or 3D ground truth for supervised learning, as well as imposing hand-crafted symmetry priors or using an implicit representation to hallucinate novel viewpoints for unsupervised methods. In this work, we propose a general adversarial learning framework for solving Unsupervised 2D to Explicit 3D Style Transfer (UE3DST). Specifically, we merge two architectures: the unsupervised explicit 3D reconstruction network of Wu et al.\ and the Generative Adversarial Network (GAN) named StarGAN-v2. We experiment across three facial datasets (Basel Face Model, 3DFAW and CelebA-HQ) and show that our solution is able to outperform well established solutions such as DepthNet in 3D reconstruction and Pix2NeRF in conditional style transfer, while we also justify the individual contributions of our model components via ablation. In contrast to the aforementioned baselines, our scheme produces features for explicit 3D rendering, which can be manipulated and utilized in downstream tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the unsupervised explicit 3D face reconstruction and style transfer (Unsupervised 2D to Explicit 3D Style Transfer, UE3DST) from a single image. Specifically, the paper proposes a general adversarial learning framework, aiming to predict the features of the 3D rendering process (such as albedo, depth, shadow, and surface normal) from a single 2D image, thereby achieving the reconstruction of 3D objects and being able to synthesize images with new styles, including shape and appearance changes. This task is usually solved in the existing literature by using 2D or 3D real - data for supervised learning, or by applying hand - made symmetry priors or using implicit representations to infer new viewpoints for unsupervised methods. However, these methods either rely on explicit 3D representations or require supervision, while the method in this paper completes explicit 3D reconstruction and style transfer under unsupervised conditions, which is a more challenging task. The main contributions of the paper are as follows: 1. **Proposing a framework that combines an unsupervised explicit 3D reconstruction network and a generative adversarial network (GAN)**: This framework can recover the 3D structure from a single 2D image without relying on labeled data and can change the style of 3D objects. 2. **Experimental verification of the effectiveness of the method on multiple facial datasets**: The authors conducted experiments on three datasets, namely Basel Face Model, 3DFAW, and CelebA - HQ. The results show that the proposed method outperforms existing solutions such as DepthNet and Pix2NeRF in both 3D reconstruction and conditional style transfer. 3. **Evaluating the contributions of each component of the model through ablation studies**: The authors further verified the effectiveness of each part of the model through ablation studies, ensuring the overall performance of the model. In summary, this paper aims to solve the problem of unsupervised explicit 3D reconstruction and style transfer from a single image. By proposing an adversarial learning framework that combines unsupervised 3D reconstruction and style transfer, it achieves superior performance on multiple datasets.