Spatial Latent Representations in Generative Adversarial Networks for Image Generation

Maciej Sypetkowski
2023-03-26
Abstract:In the majority of GAN architectures, the latent space is defined as a set of vectors of given dimensionality. Such representations are not easily interpretable and do not capture spatial information of image content directly. In this work, we define a family of spatial latent spaces for StyleGAN2, capable of capturing more details and representing images that are out-of-sample in terms of the number and arrangement of object parts, such as an image of multiple faces or a face with more than two eyes. We propose a method for encoding images into our spaces, together with an attribute model capable of performing attribute editing in these spaces. We show that our spaces are effective for image manipulation and encode semantic information well. Our approach can be used on pre-trained generator models, and attribute edition can be done using pre-generated direction vectors making the barrier to entry for experimentation and use extremely low. We propose a regularization method for optimizing latent representations, which equalizes distributions of parts of latent spaces, making representations much closer to generated ones. We use it for encoding images into spatial spaces to obtain significant improvement in quality while keeping semantics and ability to use our attribute model for edition purposes. In total, using our methods gives encoding quality boost even as high as 30% in terms of LPIPS score comparing to standard methods, while keeping semantics. Additionally, we propose a StyleGAN2 training procedure on our spatial latent spaces, together with a custom spatial latent representation distribution to make spatially closer elements in the representation more dependent on each other than farther elements. Such approach improves the FID score by 29% on SpaceNet, and is able to generate consistent images of arbitrary sizes on spatially homogeneous datasets, like satellite imagery.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the two major challenges faced by Generative Adversarial Networks (GANs) in image generation tasks: 1. **Differences between images reconstructed from latent representations and the original images**: This may be due to insufficient parameters in the latent representation or finding sub - optimal solutions during the optimization process, such as non - convergence, getting stuck in local minima, or using a poorly - performing loss function. 2. **Lack of semantic meaning in latent representations**: When there are too many parameters in the latent representation or appropriate regularization methods are not used, even if the images reconstructed from the latent representation are very similar to the original images visually, specific effects cannot be achieved through editing methods, such as changing the age of a person in the image or performing meaningful image interpolation. To solve these problems, the author proposes a new spatial latent representation method for the StyleGAN2 architecture, aiming to capture the spatial information of images, thereby improving the quality of image generation and the ability of semantic editing. Specifically, the author's main contributions include: - **Defining new spatial latent spaces**: These spaces can capture all spatial information and do not require retraining the model. - **Demonstrating the effectiveness of spatial latent spaces in image editing**: Including spatial mixing and attribute editing. - **Proposing a method for directly projecting images onto the spatial latent space**: This method is equivariant to translation and provides greater flexibility, for example, supporting multiple faces in one image. - **Identifying and solving the inconsistency problem between projected and generated latent representations**: A regularization method is proposed to deal with this problem. - **Proposing a method for training StyleGAN2 in the spatial latent space**: This method makes elements that are closer spatially more dependent in the representation by sampling latent representations from a custom distribution, thereby improving the FID score and being able to generate consistent images of any size. Through these methods, the author not only improves the quality of image generation but also enhances the semantic and editing capabilities of latent representations, providing new ideas for the application of GANs in the fields of image generation and editing.