Representation Decomposition for Image Manipulation and Beyond

Shang-Fu Chen,Jia-Wei Yan,Ya-Fan Su,Yu-Chiang Frank Wang
DOI: https://doi.org/10.48550/arXiv.2011.00788
2022-03-23
Abstract:Representation disentanglement aims at learning interpretable features, so that the output can be recovered or manipulated accordingly. While existing works like infoGAN and AC-GAN exist, they choose to derive disjoint attribute code for feature disentanglement, which is not applicable for existing/trained generative models. In this paper, we propose a decomposition-GAN (dec-GAN), which is able to achieve the decomposition of an existing latent representation into content and attribute features. Guided by the classifier pre-trained on the attributes of interest, our dec-GAN decomposes the attributes of interest from the latent representation, while data recovery and feature consistency objectives enforce the learning of our proposed method. Our experiments on multiple image datasets confirm the effectiveness and robustness of our dec-GAN over recent representation disentanglement models.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve representation disentanglement in existing generative models, that is, to decompose content and attribute features from the existing latent representations. Some existing methods such as infoGAN and AC - GAN need to determine the attributes to be disentangled before training, and these methods cannot be directly applied to already - trained generative models. This limits their flexibility. If the attributes of interest are changed, the entire model needs to be retrained. In addition, as the scale of generative models increases, training these models becomes very time - consuming and resource - intensive. To solve these problems, the authors propose a new generative adversarial network - Decomposition - GAN (dec - GAN for short). dec - GAN can extract the attribute features of interest from the existing latent representations without retraining the generator, while maintaining data recovery and feature consistency. This method not only improves the flexibility of handling specific attributes, but also can use existing high - level generative models to describe each disentangled feature, thereby improving the effect and interpretability of image manipulation. Specifically, dec - GAN achieves its goals in the following ways: 1. **Attribute Guidance**: Use a pre - trained attribute classifier to guide the attribute encoder to extract attribute - related features. 2. **Content and Attribute Consistency**: Minimize the consistency loss of content and attribute features to ensure that the disentangled features only contain content or attribute information. 3. **Data Recovery**: Ensure that the generated images are realistic enough through the reconstruction loss. The experimental results show that dec - GAN outperforms existing disentangling models on multiple image datasets, especially in handling continuous attributes (such as posture) and discrete attributes (such as facial expressions). In addition, the training time of dec - GAN is also significantly reduced, further demonstrating its effectiveness and flexibility.