Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity

Hananeh Aliee,Ferdinand Kapl,Soroor Hediyeh-Zadeh,Fabian J. Theis

2023-07-02

Abstract:This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors. Our approach identifies both spurious and invariant latent features necessary for achieving accurate reconstruction by placing distinct conditional priors on latent features. The invariant signals are disentangled from noise by enforcing independence which facilitates the construction of an interpretable model with a causal semantic. By exploiting the interplay between data domains and labels, our method simultaneously identifies invariant features and builds invariant predictors. We apply our method to grand biological challenges, such as data integration in single-cell genomics with the aim of capturing biological variations across datasets with many samples, obtained from different conditions or multiple laboratories. Our approach allows for the incorporation of specific biological mechanisms, including gene programs, disease states, or treatment conditions into the data integration process, bridging the gap between the theoretical assumptions and real biological applications. Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest. Through extensive benchmarking using large-scale human hematopoiesis and human lung cancer data, we validate the superiority of our approach over existing methods and demonstrate that it can empower deeper insights into cellular heterogeneity and the identification of disease cell states.

Machine Learning,Quantitative Methods

What problem does this paper attempt to address?

This paper proposes a novel approach to learn invariant representations under undesired variations or confounding factors by leveraging domain variance. This problem is particularly critical in biological studies, especially in the integration and classification of single-cell genomics data. Single-cell genomics data often come from different experimental conditions or laboratories, with various biological variations and technical biases. Traditional methods often struggle to distinguish relevant biological signals from noise. The main contributions of this paper include: 1. Reexamining the fundamental assumptions of invariant representation learning and pointing out that in complex biological processes, independent and invariant causal mechanisms may not be sufficient to explain all phenomena. 2. Proposing an invariant representation learning method to identify spurious variables and invariant variables. 3. Demonstrating the identifiability of the proposed method under simple transformations and permutations of latent variables. 4. Validating the effectiveness of the method in single-cell data analysis, cell state identification, and cell type annotation through large-scale human hematopoiesis and lung cancer single-cell RNA sequencing data. The researchers construct a conditionally invariant deep generative model to effectively integrate single-cell genomics data while preserving biological variations across datasets. The model is capable of incorporating specific biological mechanisms, such as gene programs, disease states, or treatment conditions, into the data integration process to deepen the understanding of cellular heterogeneity and identify disease cell states. Compared to existing methods, this approach demonstrates stronger performance in handling single-cell data integration.

Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity

inVAE: Conditionally invariant representation learning for generating multivariate single-cell reference maps

Unsupervised Deep Disentangled Representation of Single-Cell Omics

Out-of-distribution Prediction with Disentangled Representations for Single-Cell RNA Sequencing Data

Learning interpretable latent autoencoder representations with annotations of feature sets

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Integration of scRNA-seq data by disentangled representation learning with condition domain adaptation

scARE: Attribution Regularization for Single Cell Representation Learning

Learning Invariant Molecular Representation in Latent Discrete Space

An introduction to representation learning for single-cell data analysis

Learning Domain-Agnostic Representation for Disease Diagnosis.

Analysis of multi-condition single-cell data with latent embedding multivariate regression

Orientation-Disentangled Unsupervised Representation Learning for Computational Pathology

Integrating inverse reinforcement learning into data-driven mechanistic computational models: a novel paradigm to decode cancer cell heterogeneity

Learning Identifiable Factorized Causal Representations of Cellular Responses

Causal Representation Learning from Multimodal Biological Observations

Disentangling Disease-related Representation from Obscure for Disease Prediction.

Integration of single cell data by disentangled representation learning

Learning Time-Invariant Representations for Individual Neurons from Population Dynamics

Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

Discovery and Separation of Features for Invariant Representation Learning