Partially Shared Multi-Modal Embedding Learns Holistic Representation of Cell State
Xinyi Zhang,GV Shivashankar,Caroline Uhler
DOI: https://doi.org/10.1101/2024.10.01.615977
2024-10-03
Abstract:Experimental technologies for jointly measuring different data modalities at the single-cell level offer different windows into cell state. To obtain a holistic understanding of cell state, computational methods are needed that carefully integrate the different views to capture shared information as well as tease apart modality-specific information. We present a computational framework that automatically learns partial information sharing between multiple modalities by using an Autoencoder with a Partially Overlapping Latent space learned through Latent Optimization (APOLLO). On paired scRNA-seq and scATAC-seq data (SHARE-seq) and paired scRNA-seq and surface protein data (CITE-seq), we demonstrate that APOLLO comprehensively and automatically identifies and distinguishes between information captured by both modalities, in the shared latent space, and modality-specific information. Beyond sequencing modalities, large-scale multiplexed single-cell imaging datasets, such as the Human Protein Atlas, are becoming available that allow studying how protein localization relates to function. While chromatin, microtubules or ER are standardly stained as a reference, little is known about the information shared between these stains. We found that APOLLO enables the prediction of missing modalities, such as unmeasured protein stains, and allows disentangling which modality or cellular compartment is linked with a specific phenotype, such as the variability in protein localization observed across single cells. Collectively, APOLLO enables explicit learning of shared and modality-specific information leading to a more holistic understanding of cell state and the underlying regulatory mechanisms. APOLLO is a general framework that can be applied to any multi-modal data well beyond the single-cell domain including, for example, large-scale medical biobanks.
Cell Biology
What problem does this paper attempt to address?