Abstract:The increasing availability of large-scale single-cell datasets has enabled the detailed description of cell states across multiple biological conditions and perturbations. In parallel, recent advances in unsupervised machine learning, particularly in transfer learning, have enabled fast and scalable mapping of these new single-cell datasets onto reference atlases. The resulting large-scale machine learning models however often have millions of parameters, rendering interpretation of the newly mapped datasets challenging. Here, we propose expiMap, a deep learning model that enables interpretable reference mapping using biologically understandable entities, such as curated sets of genes and gene programs. The key concept is the substitution of the uninterpretable nodes in an autoencoder’s bottleneck by labeled nodes mapping to interpretable lists of genes, such as gene ontologies, biological pathways, or curated gene sets, for which activities are learned as constraints during reconstruction. This is enabled by the incorporation of predefined gene programs into the reference model, and at the same time allowing the model to learn de novo new programs and refine existing programs during reference mapping. We show that the model retains similar integration performance as existing methods while providing a biologically interpretable framework for understanding cellular behavior. We demonstrate the capabilities of expiMap by applying it to 15 datasets encompassing five different tissues and species. The interpretable nature of the mapping revealed unreported associations between interferon signaling via the RIG-I/MDA5 and GPCRs pathways, with differential behavior in CD8+ T cells and CD14+ monocytes in severe COVID-19, as well as the role of annexins in the cellular communications between lymphoid and myeloid compartments for explaining patient response to the applied drugs. Finally, expiMap enabled the direct comparison of a diverse set of pancreatic beta cells from multiple studies where we observed a strong, previously unreported correlation between the unfolded protein response and asparagine N-linked glycosylation. Altogether, expiMap enables the interpretable mapping of single cell transcriptome data sets across cohorts, disease states and other perturbations. ### Competing Interest Statement Fabian J. Theis consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd, and Omniscope Ltd, and has ownership interest in Dermagnostix GmbH and Cellarity.

Transformer for Gene Expression Modeling (T-GEM): An Interpretable Deep Learning Model for Gene Expression-Based Phenotype Predictions

From genome to phenome: Predicting multiple cancer phenotypes based on somatic genomic alterations via the genomic impact transformer

DeepGene Transformer: Transformer for the gene expression-based classification of cancer subtypes

Deep Learning Prediction of Ribosome Profiling with Translatomer Reveals Translational Regulation and Interprets Disease Variants

Biologically Informed Deep Learning to Infer Gene Program Activity in Single Cells

Predicting Gene Spatial Expression and Cancer Prognosis: An Integrated Graph and Image Deep Learning Approach Based on HE Slides

Transformer with convolution and graph-node co-embedding: An accurate and interpretable vision backbone for predicting gene expressions from local histopathological image

Harnessing TME depicted by histological images to improve cancer prognosis through a deep learning system

TransCell: In silico Characterization of Genomic Landscape and Cellular Responses by Deep Transfer Learning

Enhancing Personalized Gene Expression Prediction From DNA Sequences Using Genomic Foundation Models

Biology-guided deep learning predicts prognosis and cancer immunotherapy response

Multifaceted Representation of Genes via Deep Learning of Gene Expression Networks

Transfer learning enables predictions in network biology

Learning interpretable cellular embedding for inferring biological mechanisms underlying single-cell transcriptomics

Transformer-based deep learning integrates multi-omic data with cancer pathways

Inferring single-cell spatial gene expression with tissue morphology via explainable deep learning

TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response

Developing explainable models for lncRNA-Targeted drug discovery using graph autoencoders

Explainable Multilayer Graph Neural Network for Cancer Gene Prediction

A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine

CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection