In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing

Yiran Xu,Zhixin Shu,Cameron Smith,Seoung Wug Oh,Jia-Bin Huang
2024-04-15
Abstract:3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts. GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code. However, a model pre-trained on a particular dataset (e.g., FFHQ) often has difficulty reconstructing images with out-of-distribution (OOD) objects such as faces with heavy make-up or occluding objects. We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs. Our core idea is to represent the image using two individual neural radiance fields: one for the in-distribution content and the other for the out-of-distribution object. The final reconstruction is achieved by optimizing the composition of these two radiance fields with carefully designed regularization. We demonstrate that our explicit decomposition alleviates the inherent trade-off between reconstruction fidelity and editability. We evaluate reconstruction accuracy and editability of our method on challenging real face images and videos and showcase favorable results against other baselines.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that existing 3D GAN inversion methods have difficulty achieving high - quality reconstruction and semantic editing when dealing with images and videos containing objects outside the training data distribution (out - of - distribution, OOD), such as heavy makeup or occlusions. Specifically, pre - trained 3D GAN models are usually only good at handling natural human faces and have weak modeling capabilities for OOD objects, resulting in poor reconstruction quality or unsatisfactory editing effects. To solve this problem, the author proposes a new method that explicitly decomposes an image into two independent neural radiance fields to represent in - distribution (InD) content and OOD objects respectively. This method allows for more faithful reconstruction of the input image while retaining the ability of semantic editing. The following are the key points of this method: - **Explicit Decomposition**: Represent the InD part (such as natural human faces) and the OOD part (such as heavy makeup or occlusions) in the image with two different neural radiance fields respectively. - **Combined Volume Rendering**: Perform the final image reconstruction by combining these two radiance fields. - **Optimization Strategy**: Introduce specific regularization terms to ensure the accuracy of reconstruction and the controllability of editing. In this way, this method can significantly improve the fidelity of reconstruction and the quality of editing when dealing with challenging OOD objects. ### Summary of Mathematical Formulas 1. **Latent Code Regularization Loss**: \[ L_w(w_t)=\|w_t - \bar{w}\|_2^2 \] where \(\bar{w}\) is the average latent code calculated from 10,000 sample latent codes. 2. **Style Vector Variation Regularization**: \[ L_\Delta(w_t)=\sum_{i = 1}^{13}\|\Delta_i\|_2^2 \] where \(w=(w_0, w_0+\Delta_1,\cdots, w_0+\Delta_{13})\in\mathbb{R}^{14\times512}\). 3. **Color, Density and Mixing Weight Output**: \[ (c_O,\sigma_O,b)=D_O(T_O(t_k),\phi_t;\theta_{D_O}) \] where \(T_O(t_k)\in\mathbb{R}^{32}\) is the feature projected from three feature planes by bilinear interpolation. 4. **Volume Rendering Integral**: \[ C_O(r)=\sum_{k = 1}^K T(t_k)\alpha_O(\sigma_O(t_k)\delta_k)c_O(t_k) \] where \(T(t_k)=\exp\left(-\sum_{k'= 1}^{k - 1}\sigma(t_{k'})\delta_{k'}\right)\), \(\alpha = 1-\exp(-x)\), \(\delta_k=t_{k + 1}-t_k\). 5. **Composite Volume Rendering**: \[ C_C(r)=\sum_{k = 1}^K T_C(t_k)\left(b\alpha_O(\sigma_O(t_k)\delta_k)c_O(t_k)+(1 - b)\alpha_I(\sigma_I(t_k)\delta_k)c_I(t_k)\right) \] where \(T_C(t_k)=\exp\left(-\sum_{k'= 1}^{k - 1}(\sigma_O+\sigma_I)\delta_{k'}\right)\). 6. **Total Loss Function**: \[ L_{LR}=\sum_{t = 1}^N L_C^t+