Abstract:In the majority of GAN architectures, the latent space is defined as a set of vectors of given dimensionality. Such representations are not easily interpretable and do not capture spatial information of image content directly. In this work, we define a family of spatial latent spaces for StyleGAN2, capable of capturing more details and representing images that are out-of-sample in terms of the number and arrangement of object parts, such as an image of multiple faces or a face with more than two eyes. We propose a method for encoding images into our spaces, together with an attribute model capable of performing attribute editing in these spaces. We show that our spaces are effective for image manipulation and encode semantic information well. Our approach can be used on pre-trained generator models, and attribute edition can be done using pre-generated direction vectors making the barrier to entry for experimentation and use extremely low. We propose a regularization method for optimizing latent representations, which equalizes distributions of parts of latent spaces, making representations much closer to generated ones. We use it for encoding images into spatial spaces to obtain significant improvement in quality while keeping semantics and ability to use our attribute model for edition purposes. In total, using our methods gives encoding quality boost even as high as 30% in terms of LPIPS score comparing to standard methods, while keeping semantics. Additionally, we propose a StyleGAN2 training procedure on our spatial latent spaces, together with a custom spatial latent representation distribution to make spatially closer elements in the representation more dependent on each other than farther elements. Such approach improves the FID score by 29% on SpaceNet, and is able to generate consistent images of arbitrary sizes on spatially homogeneous datasets, like satellite imagery.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the two major challenges faced by Generative Adversarial Networks (GANs) in image generation tasks: 1. **Differences between images reconstructed from latent representations and the original images**: This may be due to insufficient parameters in the latent representation or finding sub - optimal solutions during the optimization process, such as non - convergence, getting stuck in local minima, or using a poorly - performing loss function. 2. **Lack of semantic meaning in latent representations**: When there are too many parameters in the latent representation or appropriate regularization methods are not used, even if the images reconstructed from the latent representation are very similar to the original images visually, specific effects cannot be achieved through editing methods, such as changing the age of a person in the image or performing meaningful image interpolation. To solve these problems, the author proposes a new spatial latent representation method for the StyleGAN2 architecture, aiming to capture the spatial information of images, thereby improving the quality of image generation and the ability of semantic editing. Specifically, the author's main contributions include: - **Defining new spatial latent spaces**: These spaces can capture all spatial information and do not require retraining the model. - **Demonstrating the effectiveness of spatial latent spaces in image editing**: Including spatial mixing and attribute editing. - **Proposing a method for directly projecting images onto the spatial latent space**: This method is equivariant to translation and provides greater flexibility, for example, supporting multiple faces in one image. - **Identifying and solving the inconsistency problem between projected and generated latent representations**: A regularization method is proposed to deal with this problem. - **Proposing a method for training StyleGAN2 in the spatial latent space**: This method makes elements that are closer spatially more dependent in the representation by sampling latent representations from a custom distribution, thereby improving the FID score and being able to generate consistent images of any size. Through these methods, the author not only improves the quality of image generation but also enhances the semantic and editing capabilities of latent representations, providing new ideas for the application of GANs in the fields of image generation and editing.

Spatial Latent Representations in Generative Adversarial Networks for Image Generation

SpatialGAN: Progressive Image Generation Based on Spatial Recursive Adversarial Expansion

Towards Spatially Disentangled Manipulation of Face Images With Pre-Trained StyleGANs

Discovering Density-Preserving Latent Space Walks in GANs for Semantic Image Transformations.

Trained Latent Space Navigation to Prevent Lack of Photorealism in Generated Images on Style-based Models

StyleGenes: Discrete and Efficient Latent Distributions for GANs

Improving Generative Adversarial Networks via Adversarial Learning in Latent Space

Spatially Constrained Generative Adversarial Networks for Conditional Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

Image generation via latent space learning using improved combination

Generative Image Modeling using Style and Structure Adversarial Networks

Improved StyleGAN Embedding: Where are the Good Latents?

Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?

Style Generator Inversion for Image Enhancement and Animation

MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance

Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators?

Spatially Constrained GAN for Face and Fashion Synthesis.

OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs

Discovering Interpretable Latent Space Directions for 3D-Aware Image Generation

LatentSwap3D: Semantic Edits on 3D Image GANs

Surrogate Gradient Field for Latent Space Manipulation