Principled feature attribution for unsupervised gene expression analysis

Ting-I Lee,Su-In Lee,J. Russell,Matt Kaeberlin,Ben W. Blue,Joseph D. Janizek,Safiye Celik,Anna Spiro
DOI: https://doi.org/10.1101/2022.05.03.490535
2022-05-04
bioRxiv
Abstract:As interest in unsupervised deep learning models for the analysis of gene expression data has grown, an increasing number of methods have been developed to make these deep learning models more interpretable. These methods can be separated into two groups: (1) post hoc analyses of black box models through feature attribution methods and (2) approaches to build inherently interpretable models through biologically-constrained architectures. In this work, we argue that these approaches are not mutually exclusive, but can in fact be usefully combined. We propose a novel unsupervised pathway attribution method, which better identifies major sources of transcriptomic variation than prior methods when combined with biologically-constrained neural network models. We demonstrate how principled feature attributions aid in the analysis of a variety of single cell datasets. Finally, we apply our approach to a large dataset of post-mortem brain samples from patients with Alzheimer’s disease, and show that it identifies Mitochondrial Respiratory Complex I as an important factor in this disease.
Biology,Computer Science
What problem does this paper attempt to address?