Abstract:MOTIVATION:While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation.RESULTS:We overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%.AVAILABILITY AND IMPLEMENTATION:The trVAE implementation is available via github.com/theislab/trvae. The results of this article can be reproduced via github.com/theislab/trvae_reproducibility.

Disentangling the Spatial Structure and Style in Conditional VAE.

Conditional out-of-sample generation for unpaired data using trVAE

Facial Landmark Disentangled Network with Variational Autoencoder

Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

Style Feature Extraction Using Contrastive Conditioned Variational Autoencoders with Mutual Information Constraints

Advanced Conditional Variational Autoencoders (A-CVAE): Towards interpreting open-domain conversation generation via disentangling latent feature representation

Guided Variational Autoencoder for Disentanglement Learning

Rethinking Controllable Variational Autoencoders

Conditional Out-of-distribution Generation for Unpaired Data Using Transfer VAE.

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations.

Data-Dependent Conditional Priors for Unsupervised Learning of Multimodal Data

Neighborhood Geometric Structure-Preserving Variational Autoencoder for Smooth and Bounded Data Sources

Learning Disentangled Discrete Representations

From abstract items to latent spaces to observed data and back: Compositional Variational Auto-Encoder

Learning Manifold Dimensions with Conditional Variational Autoencoders

On the Encoder-Decoder Incompatibility in Variational Text Modeling and Beyond

C$^2$VAE: Gaussian Copula-based VAE Differing Disentangled from Coupled Representations with Contrastive Posterior

Disentangled Variational Auto-Encoder for Semi-supervised Learning

Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncoders

Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders