Abstract:Disentanglement learning aims to separate explanatory factors of variation so that different attributes of the data can be well characterized and isolated, which promotes efficient inference for downstream tasks. Mainstream disentanglement approaches based on generative adversarial networks (GANs) learn interpretable data representation. However, most typical GAN-based works lack the discussion of the latent subspace, causing insufficient consideration of the variation of independent factors. Although some recent research analyzes the latent space on pretrained GANs for image editing, they do not emphasize learning representation directly from the subspace perspective. Appropriate subspace properties could facilitate corresponding feature representation learning to satisfy the independent variation requirements of the obtained explanatory factors, which is crucial for better disentanglement. In this work, we propose a unified framework for ensuring disentanglement, which fully investigates latent subspace learning (SL) in GAN. The novel GAN-based architecture explores orthogonal subspace representation (OSR) on vanilla GAN, named OSRGAN. To guide a subspace with strong correlation, less redundancy, and robust distinguishability, our OSR includes three stages, self-latent-aware, orthogonal subspace-aware, and structure representation-aware, respectively. First, the self-latent-aware stage promotes the latent subspace strongly correlated with the data space to discover interpretable factors, but with poor independence of variation. Second, the following orthogonal subspace-aware stage adaptively learns some 1-D linear subspace spanned by a set of orthogonal bases in the latent space. There is less redundancy between them, expressing the corresponding independence. Third, the structure representation-aware stage aligns the projection on the orthogonal subspace and the latent variables. Accordingly, feature representation in each linear subspace can be distinguishable, enhancing the independent expression of interpretable factors. In addition, we design an alternating optimization step, achieving a tradeoff training of OSRGAN on different properties. Despite it strictly constrains orthogonality, the loss weight coefficient of distinguishability induced by orthogonality could be adjusted and balanced with correlation constraint. To elucidate, this tradeoff training prevents our OSRGAN from overemphasizing any property and damaging the expressiveness of the feature representation. It takes into account both interpretable factors and their independent variation characteristics. Meanwhile, alternating optimization could keep the cost and efficiency of forward inference unchanged and will not burden the computational complexity. In theory, we clarify the significance of OSR, which brings better independence of factors, along with interpretability as correlation could converge to a high range faster. Moreover, through the convergence behavior analysis, including the objective functions under different constraints and the evaluation curve with iterations, our model demonstrates enhanced stability and definitely converges toward a higher peak for disentanglement. To depict the performance in downstream tasks, we compared the state-of-the-art GAN-based and even VAE-based approaches on different datasets. Our OSRGAN achieves higher disentanglement scores on FactorVAE, SAP, MIG, and VP metrics. All the experimental results illustrate that our novel GAN-based framework has considerable advantages on disentanglement.

Representation Decomposition for Image Manipulation and Beyond

InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs

DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Orthogonal Subspace Representation for Generative Adversarial Networks

Learning Disentangled Representation by Exploiting Pretrained Generative Models: A Contrastive Learning View

CoDeGAN: Contrastive Disentanglement for Generative Adversarial Network

ComGAN: Unsupervised Disentanglement and Segmentation via Image Composition

Image Generation and Translation with Disentangled Representations

Interpreting the Latent Space of GANs for Semantic Face Editing

Learning to Disentangle GAN Fingerprint for Fake Image Attribution

Representation Learning by Rotating Your Faces

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Facial Expression Representation Learning by Synthesizing Expression Images

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search

Facial Expression Recognition Using Disentangled Adversarial Learning

InDecGAN: Learning to Generate Complex Images from Captions Via Independent Object-Level Decomposition and Enhancement

GAN‐Based Multi‐Decomposition Photo Cartoonization

Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

Disentangled Representations in Neural Models