Biologically Interpretable VAE with Supervision for Transcriptomics Data Under Ordinal Perturbations

Seyednami Niyakan,Byung-Jun Yoon,Xiaoning Qian,Xihaier Luo
DOI: https://doi.org/10.1101/2024.03.28.587231
2024-03-29
Abstract:Latent variable models such as the Variational Auto-Encoders (VAEs) have shown impressive performance for inferring expression patterns for cell subtyping and biomarker identification from transcriptomics data. However, the limited interpretability of their latent variables obscures deriving meaningful biological understanding of cellular responses to different external and internal perturbations. We here propose a novel deep learning framework, EXPORT ( lainable VAE for dinally perturbed ranscriptomics data), for analyzing ordinally perturbed transcriptomics data that can incorporate any biological pathway knowledge in the VAE latent space. With the corresponding pathway-informed decoder, the learned latent expression patterns can be explained as pathway-level responses to perturbations, offering direct interpretability with biological understanding. More importantly, we explicitly model the ordinal nature of many real-world perturbations into the EXPORT framework by training an auxiliary ordinal regressor neural network to capture corresponding expression changes in the VAE latent representations, for example under different dosage levels of radiation exposure. By incorporating ordinal constraints during the training of our proposed framework, we further enhance the model interpretability by guiding the VAE latent space to organize perturbation responses in a hierarchical manner. We demonstrate the utility of the inferred guided latent space for downstream tasks, such as identifying key regulatory pathways associated with specific perturbation changes by analyzing transcriptomics datasets on both bulk and single-cell data. Overall, we envision that our proposed approach can unravel unprecedented biological intricacies in cellular responses to various perturbations while bringing an additional layer of interpretability to biology-inspired deep learning models.
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address the following issues: In transcriptomics data analysis, researchers typically focus on the response patterns of cells to ordinal perturbations (such as different dosage levels of drug screening and radiation exposure). Although existing Variational Autoencoders (VAEs) can reveal biological insights from large and heterogeneous perturbation-induced gene expression data, the interpretability of their latent variables is limited, making these models "black boxes." To overcome this limitation, this paper proposes a new deep learning framework—EXPORT (EXplainable VAE for Ordinally perturbed Transcriptomics data), for analyzing transcriptomics data under ordinal perturbations. Specifically, EXPORT guides the latent space of the VAE to organize perturbation responses in a hierarchical manner by training an auxiliary ordinal regression neural network and explicitly modeling the ordinal relationships in the training loss function. In this way, EXPORT not only enhances the interpretability of the model but also identifies key regulatory pathways associated with specific perturbation changes, making it suitable for handling both bulk and single-cell datasets. Overall, this method can reveal unprecedented biological complexity when analyzing cell responses under various perturbations, while adding a layer of interpretability to biologically inspired deep learning models.