What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion

Pu Cao,Lu Yang,Dongxv Liu,Xiaoya Yang,Tianrui Huang,Qing Song
2023-11-01
Abstract:Recently, inversion methods have focused on additional high-rate information in the generator (e.g., weights or intermediate features) to refine inversion and editing results from embedded latent codes. Although these techniques gain reasonable improvement in reconstruction, they decrease editing capability, especially on complex images (e.g., containing occlusions, detailed backgrounds, and artifacts). A vital crux is refining inversion results, avoiding editing capability degradation. To tackle this problem, we introduce Domain-Specific Hybrid Refinement (DHR), which draws on the advantages and disadvantages of two mainstream refinement techniques to maintain editing ability with fidelity improvement. Specifically, we first propose Domain-Specific Segmentation to segment images into two parts: in-domain and out-of-domain parts. The refinement process aims to maintain the editability for in-domain areas and improve two domains' fidelity. We refine these two parts by weight modulation and feature modulation, which we call Hybrid Modulation Refinement. Our proposed method is compatible with all latent code embedding methods. Extension experiments demonstrate that our approach achieves state-of-the-art in real image inversion and editing. Code is available at <a class="link-external link-https" href="https://github.com/caopulan/Domain-Specific_Hybrid_Refinement_Inversion" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue in the Generative Adversarial Network (GAN) inversion process, where introducing high-bit-rate information for optimization leads to a decline in editing capabilities when dealing with images containing occlusions, complex backgrounds, and details. Specifically, although existing optimization methods have achieved significant improvements in image reconstruction, these methods often disrupt the highly semantic latent space of pre-trained GANs, resulting in reduced editing capabilities. To overcome this problem, the authors propose a new optimization mechanism—Domain-Specific Hybrid Refinement (DHR), aimed at improving reconstruction accuracy during the inversion process while retaining editing capabilities. ### Background of the Paper Generative Adversarial Networks (GANs) have shown excellent performance in image generation tasks, producing images with high resolution that are difficult to distinguish from real images. Based on their highly semantic latent space, GANs are widely used for image manipulation and controllable generation. However, real images cannot be directly applied to these tasks because they require latent codes in the GAN feature space. Therefore, inversion techniques have emerged to convert real images into the latent space of GANs for reconstruction and editing. ### Problem Description Existing inversion methods are mainly divided into two categories: one is training image encoders to convert a given image into a latent code, and the other is iteratively optimizing the initial latent code by minimizing the difference between the given image and the reconstructed image. Although these methods have improved reconstruction performance, due to the low-bit-rate nature of latent codes, high-bit-rate image details are often not faithfully reconstructed. To address this, many studies have introduced high-bit-rate information to optimize the results, mainly including weight modulation and feature modulation. However, while these methods improve reconstruction accuracy, they often reduce editing capabilities, especially when dealing with images containing complex parts. ### Solution The authors propose the Domain-Specific Hybrid Refinement (DHR) method to address this issue through a "divide and conquer" strategy. Specifically, DHR divides the image into in-domain and out-of-domain parts: - **In-domain parts**: These areas are close to the generator's output distribution, making them easy to invert and edit. - **Out-of-domain parts**: These areas are inconsistent with the generator's output distribution, making them difficult to invert but requiring faithful reconstruction. The DHR method includes three main components: 1. **Image Embedding Module**: Uses a pre-trained encoder to embed the image into a latent code. 2. **Domain-Specific Segmentation Module**: Automatically segments the image into in-domain and out-of-domain parts without additional data annotation. 3. **Hybrid Modulation Optimization Module**: Applies weight modulation to the in-domain parts and feature modulation to the out-of-domain parts to maintain editing capabilities and improve reconstruction accuracy. ### Experimental Results Experimental results show that the DHR method achieves significant improvements in both image inversion and editing tasks, particularly outperforming existing methods on real-world images. User studies also indicate that users prefer images generated by the DHR method because it excels in reducing image distortion and enhancing the realism of reconstruction and editing. ### Conclusion This paper proposes a new inversion method—Domain-Specific Hybrid Refinement (DHR), which effectively addresses the issue of reduced editing capabilities in existing methods when dealing with complex images by dividing the image into in-domain and out-of-domain parts and applying weight modulation and feature modulation, respectively. The DHR method achieves significant improvements in both inversion and editing tasks, demonstrating its potential in practical applications.