eDOC: Explainable Decoding Out-of-domain Cell Types with Evidential Learning

Chaochen Wu,Meiyun Zuo,Lei Xie
2024-10-31
Abstract:Single-cell RNA-seq (scRNA-seq) technology is a powerful tool for unraveling the complexity of biological systems. One of essential and fundamental tasks in scRNA-seq data analysis is Cell Type Annotation (CTA). In spite of tremendous efforts in developing machine learning methods for this problem, several challenges remains. They include identifying Out-of-Domain (OOD) cell types, quantifying the uncertainty of unseen cell type annotations, and determining interpretable cell type-specific gene drivers for an OOD case. OOD cell types are often associated with therapeutic responses and disease origins, making them critical for precision medicine and early disease diagnosis. Additionally, scRNA-seq data contains tens thousands of gene expressions. Pinpointing gene drivers underlying CTA can provide deep insight into gene regulatory mechanisms and serve as disease biomarkers. In this study, we develop a new method, eDOC, to address aforementioned challenges. eDOC leverages a transformer architecture with evidential learning to annotate In-Domain (IND) and OOD cell types as well as to highlight genes that contribute both IND cells and OOD cells in a single cell resolution. Rigorous experiments demonstrate that eDOC significantly improves the efficiency and effectiveness of OOD cell type and gene driver identification compared to other state-of-the-art methods. Our findings suggest that eDOC may provide new insights into single-cell biology.
Genomics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to address several key challenges in single-cell RNA sequencing (scRNA-seq) data analysis, specifically including: 1. **Identification of Out-of-Domain (OOD) Cells**: In practical applications, some cells may not belong to any known cell types. These unknown or unrecognized cells are referred to as OOD cells. Reliably identifying these unknown cells is of great significance for discovering new biological processes and serving as biomarkers for precision medicine and disease diagnosis. 2. **Quantifying the Uncertainty of Cell Type Annotation**: In risk-sensitive medical applications, it is crucial to quantify the reliability and uncertainty of cell type annotations. 3. **Interpreting Predictions of New Cell Types**: To understand how new cell types arise and to interpret prediction results, it is necessary to highlight marker genes to explain new cell types. The paper proposes a new method—eDOC (explainable Decoding Out-of-domain Cell Types), which utilizes the Transformer architecture and Evidential Learning to annotate known (In-Domain, IND) and unknown (OOD) cell types, and highlights genes contributing to IND and OOD cells at single-cell resolution. Experimental results show that eDOC significantly outperforms other existing methods in the efficiency and effectiveness of identifying OOD cell types and gene drivers.