Abstract:We introduce a conditional generative model for learning to disentangle the hidden factors of variation within a set of labeled observations, and separate them into complementary codes. One code summarizes the specified factors of variation associated with the labels. The other summarizes the remaining unspecified variability. During training, the only available source of supervision comes from our ability to distinguish among different observations belonging to the same class. Examples of such observations include images of a set of labeled objects captured at different viewpoints, or recordings of set of speakers dictating multiple phrases. In both instances, the intra-class diversity is the source of the unspecified factors of variation: each object is observed at multiple viewpoints, and each speaker dictates multiple phrases. Learning to disentangle the specified factors from the unspecified ones becomes easier when strong supervision is possible. Suppose that during training, we have access to pairs of images, where each pair shows two different objects captured from the same viewpoint. This source of alignment allows us to solve our task using existing methods. However, labels for the unspecified factors are usually unavailable in realistic scenarios where data acquisition is not strictly controlled. We address the problem of disentaglement in this more general setting by combining deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of generalizing to unseen classes and intra-class variabilities.

Disentangling semantics in language through VAEs and a certain architectural choice

Generating Sentences from Disentangled Syntactic and Semantic Spaces

Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncoders

Towards Unsupervised Content Disentanglement in Sentence Representations via Syntactic Roles

Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks

Disentangling continuous and discrete linguistic signals in transformer-based sentence embeddings

Guided Variational Autoencoder for Disentanglement Learning

Unpicking Data at the Seams: VAEs, Disentanglement and Independent Components

Explicitly disentangling image content from translation and rotation with spatial-VAE

Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders

A Bilingual Generative Transformer for Semantic Sentence Embedding

Disentangled Variational Auto-Encoder for Semi-supervised Learning

Formal Semantic Geometry over Transformer-based Variational AutoEncoder

Disentangled VAE Representations for Multi-Aspect and Missing Data

Learning Disentangled Discrete Representations

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

From abstract items to latent spaces to observed data and back: Compositional Variational Auto-Encoder

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

Disentangling Generative Factors of Physical Fields Using Variational Autoencoders

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

DAVA: Disentangling Adversarial Variational Autoencoder