Abstract:Semantic image editing requires inpainting pixels following a semantic map. It is a challenging task since this inpainting requires both harmony with the context and strict compliance with the semantic maps. The majority of the previous methods proposed for this task try to encode the whole information from erased images. However, when an object is added to a scene such as a car, its style cannot be encoded from the context alone. On the other hand, the models that can output diverse generations struggle to output images that have seamless boundaries between the generated and unerased parts. Additionally, previous methods do not have a mechanism to encode the styles of visible and partially visible objects differently for better performance. In this work, we propose a framework that can encode visible and partially visible objects with a novel mechanism to achieve consistency in the style encoding and final generations. We extensively compare with previous conditional image generation and semantic image editing algorithms. Our extensive experiments show that our method significantly improves over the state-of-the-art. Our method not only achieves better quantitative results but also provides diverse results. Please refer to the project web page for the released code and demo: <a class="link-external link-https" href="https://github.com/hakansivuk/DivSem" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper primarily aims to address several key challenges in Semantic Image Editing, particularly how to maintain consistency and coherence of the generated content with the surrounding environment when adding, deleting, or modifying objects in an image based on a semantic map. Specifically, the paper attempts to solve the following issues: 1. **Style Consistency**: When performing semantic editing on an image, the generated parts need to maintain stylistic consistency with the rest of the image. Existing methods often struggle to maintain this consistency while generating new content. 2. **Boundary Smoothness**: There are often noticeable boundaries between the generated content and the original unedited parts, affecting the visual effect. The proposed method aims to generate smooth transition boundaries. 3. **Style Encoding of Visible and Partially Visible Objects**: Existing methods usually lack a mechanism to distinctly encode visible and partially visible objects. This leads to poor performance when dealing with partially occluded situations. 4. **Diversity Realization**: Many existing models either fail to generate diverse results or sacrifice the coherence between the generated content and the original image while generating diverse images. To address the above issues, the paper proposes a new framework that can effectively encode the style information of visible and partially visible objects and achieve style consistency and diverse generation results. Experimental validation shows that this method significantly improves over existing conditional image generation and semantic image editing algorithms, outperforming the current state-of-the-art on multiple datasets. Additionally, this method performs well in various application scenarios, including object removal, diverse object addition, panorama generation, and diverse semantic filling.

Diverse Semantic Image Editing with Style Codes

Diverse Semantic Image Synthesis via Probability Distribution Modeling

Context-Consistent Semantic Image Editing with Style-Preserved Modulation.

SIEDOB: Semantic Image Editing by Disentangling Object and Background

Semantic Probability Distribution Modeling for Diverse Semantic Image Synthesis

SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

On the Diversity of Conditional Image Synthesis with Semantic Layouts.

Semantic Image Synthesis via Class-Adaptive Cross-Attention

Semantic-related image style transfer with dual-consistency loss.

Sem-CS: Semantic CLIPStyler for Text-Based Image Style Transfer

Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks

Semantic-aware Noise Driven Portrait Synthesis and Manipulation

Latents2Semantics: Leveraging the Latent Space of Generative Models for Localized Style Manipulation of Face Images

D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods

Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics

Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion

Semantics-Guided Object Removal for Facial Images: with Broad Applicability and Robust Style Preservation

IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis

StyleAdapter: A Unified Stylized Image Generation Model

Semantic Image Manipulation Using Scene Graphs