scPanel: a tool for automatic identification of sparse gene panels for generalizable patient classification using scRNA-seq datasets

Yi Xie,Jianfei Yang,John F Ouyang,Enrico Petretto
DOI: https://doi.org/10.1093/bib/bbae482
IF: 9.5
2024-10-02
Briefings in Bioinformatics
Abstract:Single-cell RNA sequencing (scRNA-seq) technologies can generate transcriptomic profiles at a single-cell resolution in large patient cohorts, facilitating discovery of gene and cellular biomarkers for disease. Yet, when the number of biomarker genes is large, the translation to clinical applications is challenging due to prohibitive sequencing costs. Here, we introduce scPanel, a computational framework designed to bridge the gap between biomarker discovery and clinical application by identifying a sparse gene panel for patient classification from the cell population(s) most responsive to perturbations (e.g. diseases/drugs). scPanel incorporates a data-driven way to automatically determine a minimal number of informative biomarker genes. Patient-level classification is achieved by aggregating the prediction probabilities of cells associated with a patient using the area under the curve score. Application of scPanel to scleroderma, colorectal cancer, and COVID-19 datasets resulted in high patient classification accuracy using only a small number of genes (<20), automatically selected from the entire transcriptome. In the COVID-19 case study, we demonstrated cross-dataset generalizability in predicting disease state in an external patient cohort. scPanel outperforms other state-of-the-art gene selection methods for patient classification and can be used to identify parsimonious sets of reliable biomarker candidates for clinical translation.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?