Controllable multi-domain semantic artwork synthesis

Yuantian Huang,Satoshi Iizuka,Edgar Simo-Serra,Kazuhiro Fukui
DOI: https://doi.org/10.1007/s41095-023-0356-2
IF: 4.1268
2024-01-03
Computational Visual Media
Abstract:Abstract We present a novel framework for the multi-domain synthesis of artworks from semantic layouts. One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis. To address this problem, we propose a dataset called ArtSem that contains 40,000 images of artwork from four different domains, with their corresponding semantic label maps. We first extracted semantic maps from landscape photography and used a conditional generative adversarial network (GAN)-based approach for generating high-quality artwork from semantic maps without requiring paired training data. Furthermore, we propose an artwork-synthesis model using domain-dependent variational encoders for high-quality multi-domain synthesis. Subsequently, the model was improved and complemented with a simple but effective normalization method based on jointly normalizing semantics and style, which we call spatially style-adaptive normalization (SSTAN). Compared to the previous methods, which only take semantic layout as the input, our model jointly learns style and semantic information representation, improving the generation quality of artistic images. These results indicate that our model learned to separate the domains in the latent space. Thus, we can perform fine-grained control of the synthesized artwork by identifying hyperplanes that separate the different domains. Moreover, by combining the proposed dataset and approach, we generated user-controllable artworks of higher quality than that of existing approaches, as corroborated by quantitative metrics and a user study.
computer science, software engineering
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the controllability in art work generation and the shortage of data sets. Specifically, the existing art work generation methods often lack fine - grained control over the generated content and lack publicly available segmented data sets to support art synthesis tasks. To solve these problems, the author proposes a new data set named ArtSem, which contains 40,000 art works from four different fields and their corresponding semantic label maps. In addition, the author also proposes a multi - domain high - quality art work synthesis model (CMSAS). This model uses domain - specific variational encoders and generators based on Spatial Style - Adaptive Normalization (SSTAN) to achieve high - quality, multi - domain synthesis from semantic layouts to art works. ### Main contributions: 1. **Single - model semantic art work synthesis method**: It can generate high - quality art works in multiple domains from easy - to - operate semantic layout inputs. 2. **High - quality pixel - aligned semantic art work data set**: It contains art images with paired segmentation masks. 3. **Effective normalization method**: It significantly improves the generation quality of art works. 4. **Highly controllable generation**: It realizes domain and style control through latent space operations. 5. **In - depth evaluation**: A comprehensive evaluation of the proposed method is carried out based on qualitative and quantitative comparisons with existing methods. ### Solutions: - **Data set construction**: High - quality paired training data is generated by extracting semantic maps from landscape photos and using an unsupervised image - to - image translation model to convert them into different art styles. - **Model architecture**: The CMSAS model includes domain - specific variational encoders and generators. The generator uses the SSTAN module to combine semantic and style information and improve the quality of the generated images. - **Domain and style control**: Fine - grained control over the generated art works is achieved by identifying hyperplanes that separate different domains in the latent space. ### Experimental results: - **Quantitative evaluation**: Evaluation is carried out using automatic metrics and perceptual user studies, and the results show that the proposed method outperforms existing methods in all metrics. - **Qualitative evaluation**: The generated art works are of higher quality, and users can achieve fine - grained control over the generation results by adjusting the semantic layout and latent codes. In conclusion, this paper effectively solves the problems of controllability and data set shortage in art work generation by introducing a new data set and an improved generation model, providing new tools and methods for art creation.