Abstract:3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts. GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code. However, a model pre-trained on a particular dataset (e.g., FFHQ) often has difficulty reconstructing images with out-of-distribution (OOD) objects such as faces with heavy make-up or occluding objects. We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs. Our core idea is to represent the image using two individual neural radiance fields: one for the in-distribution content and the other for the out-of-distribution object. The final reconstruction is achieved by optimizing the composition of these two radiance fields with carefully designed regularization. We demonstrate that our explicit decomposition alleviates the inherent trade-off between reconstruction fidelity and editability. We evaluate reconstruction accuracy and editability of our method on challenging real face images and videos and showcase favorable results against other baselines.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is that existing 3D GAN inversion methods have difficulty achieving high - quality reconstruction and semantic editing when dealing with images and videos containing objects outside the training data distribution (out - of - distribution, OOD), such as heavy makeup or occlusions. Specifically, pre - trained 3D GAN models are usually only good at handling natural human faces and have weak modeling capabilities for OOD objects, resulting in poor reconstruction quality or unsatisfactory editing effects. To solve this problem, the author proposes a new method that explicitly decomposes an image into two independent neural radiance fields to represent in - distribution (InD) content and OOD objects respectively. This method allows for more faithful reconstruction of the input image while retaining the ability of semantic editing. The following are the key points of this method: - **Explicit Decomposition**: Represent the InD part (such as natural human faces) and the OOD part (such as heavy makeup or occlusions) in the image with two different neural radiance fields respectively. - **Combined Volume Rendering**: Perform the final image reconstruction by combining these two radiance fields. - **Optimization Strategy**: Introduce specific regularization terms to ensure the accuracy of reconstruction and the controllability of editing. In this way, this method can significantly improve the fidelity of reconstruction and the quality of editing when dealing with challenging OOD objects. ### Summary of Mathematical Formulas 1. **Latent Code Regularization Loss**: \[ L_w(w_t)=\|w_t - \bar{w}\|_2^2 \] where \(\bar{w}\) is the average latent code calculated from 10,000 sample latent codes. 2. **Style Vector Variation Regularization**: \[ L_\Delta(w_t)=\sum_{i = 1}^{13}\|\Delta_i\|_2^2 \] where \(w=(w_0, w_0+\Delta_1,\cdots, w_0+\Delta_{13})\in\mathbb{R}^{14\times512}\). 3. **Color, Density and Mixing Weight Output**: \[ (c_O,\sigma_O,b)=D_O(T_O(t_k),\phi_t;\theta_{D_O}) \] where \(T_O(t_k)\in\mathbb{R}^{32}\) is the feature projected from three feature planes by bilinear interpolation. 4. **Volume Rendering Integral**: \[ C_O(r)=\sum_{k = 1}^K T(t_k)\alpha_O(\sigma_O(t_k)\delta_k)c_O(t_k) \] where \(T(t_k)=\exp\left(-\sum_{k'= 1}^{k - 1}\sigma(t_{k'})\delta_{k'}\right)\), \(\alpha = 1-\exp(-x)\), \(\delta_k=t_{k + 1}-t_k\). 5. **Composite Volume Rendering**: \[ C_C(r)=\sum_{k = 1}^K T_C(t_k)\left(b\alpha_O(\sigma_O(t_k)\delta_k)c_O(t_k)+(1 - b)\alpha_I(\sigma_I(t_k)\delta_k)c_I(t_k)\right) \] where \(T_C(t_k)=\exp\left(-\sum_{k'= 1}^{k - 1}(\sigma_O+\sigma_I)\delta_{k'}\right)\). 6. **Total Loss Function**: \[ L_{LR}=\sum_{t = 1}^N L_C^t+

In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

In-Domain GAN Inversion for Faithful Reconstruction and Editability

3D GAN Inversion with Pose Optimization

3D GAN Inversion with Facial Symmetry Prior

Out-of-domain GAN Inversion Via Invertibility Decomposition for Photo-Realistic Human Face Manipulation

Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

DFIE3D: 3D-Aware Disentangled Face Inversion and Editing Via Facial-contrastive Learning

3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing

High-Fidelity GAN Inversion for Image Attribute Editing

Make Encoder Great Again in 3D GAN Inversion through Geometry and Occlusion-Aware Encoding

VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs

InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars

Editing Out-of-domain GAN Inversion via Differential Activations

Collaborative Encoder for Accurate Inversion of Real Face Image.

Meta-Auxiliary Network for 3D GAN Inversion

Designing a 3D-Aware StyleNeRF Encoder for Face Editing

IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis

Novel GAN Inversion Model with Latent Space Constraints for Face Reconstruction

RIGID: Recurrent GAN Inversion and Editing of Real Face Videos