Explainable autoencoder-based representation learning for gene expression data

Yang Yu,Pathum Kossinna,Qing Li,Wenyuan Liao,Qingrun Zhang
DOI: https://doi.org/10.1101/2021.12.21.473742
2021-12-23
Abstract:Abstract Modern machine learning methods have been extensively utilized in gene expression data analysis. In particular, autoencoders (AE) have been employed in processing noisy and heterogenous RNA-Seq data. However, AEs usually lead to “black-box” hidden variables difficult to interpret, hindering downstream experimental validation and clinical translation. To bridge the gap between complicated models and biological interpretations, we developed a tool, XAE4Exp (e X plainable A uto E ncoder for Exp ression data), which integrates AE and SHapley Additive exPlanations (SHAP), a flagship technique in the field of eXplainable AI (XAI). It quantitatively evaluates the contributions of each gene to the hidden structure learned by an AE, substantially improving the expandability of AE outcomes. By applying XAE4Exp to The Cancer Genome Atlas (TCGA) breast cancer gene expression data, we identified genes that are not differentially expressed, and pathways in various cancer-related classes. This tool will enable researchers and practitioners to analyze high-dimensional expression data intuitively, paving the way towards broader uses of deep learning. Availability Open source at https://github.com/QingrunZhangLab/Explainable-Deep-Autoencoder . Contacts qingrun.zhang@ucalgary.ca and wliao@ucalgary.ca .
What problem does this paper attempt to address?