Diverse Semantic Image Editing with Style Codes

Hakan Sivuk,Aysegul Dundar
2023-09-25
Abstract:Semantic image editing requires inpainting pixels following a semantic map. It is a challenging task since this inpainting requires both harmony with the context and strict compliance with the semantic maps. The majority of the previous methods proposed for this task try to encode the whole information from erased images. However, when an object is added to a scene such as a car, its style cannot be encoded from the context alone. On the other hand, the models that can output diverse generations struggle to output images that have seamless boundaries between the generated and unerased parts. Additionally, previous methods do not have a mechanism to encode the styles of visible and partially visible objects differently for better performance. In this work, we propose a framework that can encode visible and partially visible objects with a novel mechanism to achieve consistency in the style encoding and final generations. We extensively compare with previous conditional image generation and semantic image editing algorithms. Our extensive experiments show that our method significantly improves over the state-of-the-art. Our method not only achieves better quantitative results but also provides diverse results. Please refer to the project web page for the released code and demo: <a class="link-external link-https" href="https://github.com/hakansivuk/DivSem" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address several key challenges in Semantic Image Editing, particularly how to maintain consistency and coherence of the generated content with the surrounding environment when adding, deleting, or modifying objects in an image based on a semantic map. Specifically, the paper attempts to solve the following issues: 1. **Style Consistency**: When performing semantic editing on an image, the generated parts need to maintain stylistic consistency with the rest of the image. Existing methods often struggle to maintain this consistency while generating new content. 2. **Boundary Smoothness**: There are often noticeable boundaries between the generated content and the original unedited parts, affecting the visual effect. The proposed method aims to generate smooth transition boundaries. 3. **Style Encoding of Visible and Partially Visible Objects**: Existing methods usually lack a mechanism to distinctly encode visible and partially visible objects. This leads to poor performance when dealing with partially occluded situations. 4. **Diversity Realization**: Many existing models either fail to generate diverse results or sacrifice the coherence between the generated content and the original image while generating diverse images. To address the above issues, the paper proposes a new framework that can effectively encode the style information of visible and partially visible objects and achieve style consistency and diverse generation results. Experimental validation shows that this method significantly improves over existing conditional image generation and semantic image editing algorithms, outperforming the current state-of-the-art on multiple datasets. Additionally, this method performs well in various application scenarios, including object removal, diverse object addition, panorama generation, and diverse semantic filling.