Abstract:Generative models make huge progress to the photorealistic image synthesis in recent years. To enable humans to steer the image generation process and customize the output, many works explore the interpretable dimensions of the latent space in GANs. Existing methods edit the attributes of the output image such as orientation or color scheme by varying the latent code along certain directions. However, these methods usually require additional human annotations for each pretrained model, and they mostly focus on editing global attributes. In this work, we propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space or requiring extra annotations. Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias. Along with training the GAN model from scratch, these heatmaps are aligned with the emerging attention of the GAN's discriminator in a self-supervised learning manner. During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects. Moreover, we incorporate DragGAN into our framework, which facilitates fine-grained manipulation within a reasonable time and supports a coarse-to-fine editing process. Extensive experiments show that the proposed method not only enables spatial editing over human faces, animal faces, outdoor scenes, and complicated multi-object indoor scenes but also brings improvement in synthesis quality.

Discovering Interpretable Latent Space Directions for 3D-Aware Image Generation

3D-Aware Image Synthesis Via Learning Structural and Textural Representations

Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

Unsupervised Discovery of Disentangled Interpretable Directions for Layer-Wise GAN.

3D-aware Image Generation and Editing with Multi-modal Conditions

3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability

Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

Learning an Interpretable Stylized Subspace for 3D-aware Animatable Artforms

Discovering Density-Preserving Latent Space Walks in GANs for Semantic Image Transformations.

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Dual Mapping of 2D StyleGAN for 3D-Aware Image Generation and Manipulation (student Abstract)

3D GANs and Latent Space: A comprehensive survey

LatentSwap3D: Semantic Edits on 3D Image GANs

Interpreting the Latent Space of GANs for Semantic Face Editing

Disentangling the Latent Space of GANs for Semantic Face Editing.

3D-SSGAN: Lifting 2D Semantics for 3D-Aware Compositional Portrait Synthesis

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

2D GANs Meet Unsupervised Single-view 3D Reconstruction

Spatial Steerability of GANs via Self-Supervision from Discriminator

Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs

Interpreting the Latent Space of GANs via Correlation Analysis for Controllable Concept Manipulation