scDetect: a rank-based ensemble learning algorithm for cell type identification of single-cell RNA sequencing in cancer

Yifei Shen,Qinjie Chu,Michael P. Timko,Longjiang Fan
DOI: https://doi.org/10.1093/bioinformatics/btab410
IF: 5.8
2021-01-01
Bioinformatics
Abstract:Motivation: Single-cell RNA sequencing (scRNA-seq) has enabled the characterization of different cell types in many tissues and tumor samples. Cell type identification is essential for single-cell RNA profiling, currently transforming the life sciences. Often, this is achieved by searching for combinations of genes that have previously been implicated as being cell-type specific, an approach that is not quantitative and does not explicitly take advantage of other scRNA-seq studies. Batch effects and different data platforms greatly decrease the predictive performance in inter-laboratory and different data type validation. Results: Here, we present a new ensemble learning method named as 'scDetect' that combines gene expression rank-based analysis and a majority vote ensemble machine-learning probability-based prediction method capable of highly accurate classification of cells based on scRNA-seq data by different sequencing platforms. Because of tumor heterogeneity, in order to accurately predict tumor cells in the single-cell RNA-seq data, we have also incorporated cell copy number variation consensus clustering and epithelial score in the classification. We applied scDetect to scRNA-seq data from pancreatic tissue, mononuclear cells and tumor biopsies cells and show that scDetect classified individual cells with high accuracy and better than other publicly available tools.
What problem does this paper attempt to address?