Controlled GAN-Based Creature Synthesis via a Challenging Game Art Dataset -- Addressing the Noise-Latent Trade-Off

Vaibhav Vavilala,David Forsyth
DOI: https://doi.org/10.48550/arXiv.2108.08922
2021-10-21
Abstract:The state-of-the-art StyleGAN2 network supports powerful methods to create and edit art, including generating random images, finding images "like" some query, and modifying content or style. Further, recent advancements enable training with small datasets. We apply these methods to synthesize card art, by training on a novel Yu-Gi-Oh dataset. While noise inputs to StyleGAN2 are essential for good synthesis, we find that coarse-scale noise interferes with latent variables on this dataset because both control long-scale image effects. We observe over-aggressive variation in art with changes in noise and weak content control via latent variable edits. Here, we demonstrate that training a modified StyleGAN2, where coarse-scale noise is suppressed, removes these unwanted effects. We obtain a superior FID; changes in noise result in local exploration of style; and identity control is markedly improved. These results and analysis lead towards a GAN-assisted art synthesis tool for digital artists of all skill levels, which can be used in film, games, or any creative industry for artistic ideation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve high - quality image synthesis on a small data set when using Generative Adversarial Networks (GAN) to generate card art images, and be able to effectively control the generated images, especially the control of biological identities. Specifically, the author found that the existing StyleGAN2 has the following problems when dealing with the Yu - Gi - Oh card art data set containing a large number of different identities, postures, illuminations, textures and styles: 1. **Conflict between coarse - scale noise and latent variables**: When dealing with this type of data set, coarse - scale noise will interfere with latent variables. As a result, when adjusting latent variables, not only the content of the image is changed, but also the identity of the creature is significantly affected. This makes it very difficult to change the biological identity while keeping the card style unchanged. 2. **Impact of noise input**: Noise input is crucial for good image synthesis. However, on the Yu - Gi - Oh card art data set, coarse - scale noise will cause the image to change too drastically, thus affecting the control of content. To address these problems, the author proposed a modified StyleGAN2 model to improve the quality of image synthesis and content control ability by suppressing coarse - scale noise. Specific improvement measures include: - **Suppress coarse - scale noise**: During the training and inference process, fix the noise weight of the coarse - scale noise layer to 0, thereby avoiding the interference of coarse - scale noise on the long - scale features of the image. - **Improve synthesis quality**: Through the above method, the model can better control the long - scale structure of the image when generating images, thereby improving the quality of image synthesis. - **Enhance content control**: When adjusting latent variables, the modified model can more effectively change the biological identity without significantly affecting the style of the image. These improvements have significantly improved the visual quality and controllability of the generated images, providing artists with a more practical generation tool.