Abstract:Single-cell data is driving new insights into the spatiotemporal dynamics of cells and individual disease susceptibility. However, accurately identifying cell states across diverse cohorts remains challenging, as both biological variation and technical biases cause distributional shifts in the data. Separating these effects is crucial for capturing cellular heterogeneity and ensuring interpretability. To address this, we developed inVAE, a conditionally invariant deep generative model based on variational autoencoders. inVAE models the latent space as a combination of invariant variables, encoding true biological signals, and spurious variables, capturing technical biases. By conditioning the prior distribution of cells on biological covariates, such as disease variants, inVAE identifies high-resolution cell states in the invariant representation. Enforcing independence between the two representations disentangles biological signals from noise, enabling a more interpretable and generalizable model with a causal semantic. inVAE outperformed existing methods across four human cellular atlases of the human heart and lung, while uncovering novel cell states. It precisely stratified cell atlas donors based on the genetic impact of pathogenic variants, and excelled in predicting cell types and disease in unseen data, proving its generalizability as a reference model for label transfer. Furthermore, inVAE accurately identified temporal cell states and trajectories from developmental datasets, and captured spatial cell states in a spatially resolved atlas. In summary, inVAE provides a powerful method for integrating multivariate single-cell transcriptomics data. By leveraging prior knowledge such as metadata, it effectively accounts for biological variation and improves latent space interpretability by disentangling biological and technical sources of variation. These capabilities enable deeper insights into cellular heterogeneity and its role in disease progression.

Conditional Out-of-distribution Generation for Unpaired Data Using Transfer VAE.

Conditional out-of-sample generation for unpaired data using trVAE

Out-of-distribution Prediction with Disentangled Representations for Single-Cell RNA Sequencing Data

inVAE: Conditionally invariant representation learning for generating multivariate single-cell reference maps

Iterative VAE as a predictive brain model for out-of-distribution generalization

Privacy-preserving datasets by capturing feature distributions with Conditional VAEs

Virtual Conditional Generative Adversarial Networks

Similarity-assisted variational autoencoder for nonlinear dimension reduction with application to single-cell RNA sequencing data

CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training

Recurrent Variational Autoencoders for Learning Nonlinear Generative Models in the Presence of Outliers.

Multilinear Latent Conditioning for Generating Unseen Attribute Combinations

Heart transplantation in an 8-month-old girl. 10th anniversary report.

BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders

A Conditional Flow Variational Autoencoder for Controllable Synthesis of Virtual Populations of Anatomy

eVAE: Evolutionary Variational Autoencoder

TimeVAE: A Variational Auto-Encoder for Multivariate Time Series Generation

Modeling conditional distributions of neural and behavioral data with masked variational autoencoders

Conditional Unscented Autoencoders for Trajectory Prediction

Condition-transforming Variational Autoencoder for Conversation Response Generation.

Tychite, Na 6 Mg 2 (SO 4 )(CO 3 ) 4 : structure analysi

Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations.