A comparison of marker gene selection methods for single-cell RNA sequencing data

Jeffrey M. Pullin,Davis J. McCarthy
DOI: https://doi.org/10.1186/s13059-024-03183-0
IF: 17.906
2024-02-29
Genome Biology
Abstract:The development of single-cell RNA sequencing (scRNA-seq) has enabled scientists to catalog and probe the transcriptional heterogeneity of individual cells in unprecedented detail. A common step in the analysis of scRNA-seq data is the selection of so-called marker genes, most commonly to enable annotation of the biological cell types present in the sample. In this paper, we benchmark 59 computational methods for selecting marker genes in scRNA-seq data.
genetics & heredity,biotechnology & applied microbiology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the evaluation and comparison of methods for selecting marker genes in single - cell RNA sequencing (scRNA - seq) data. Specifically, the objectives of the paper include: 1. **Method evaluation**: Benchmark 59 different computational methods for selecting marker genes from scRNA - seq data. These methods cover traditional statistical methods and modern machine - learning methods. 2. **Performance comparison**: Compare the performance of these methods in the following aspects by using 14 real scRNA - seq datasets and over 170 simulated datasets: - **Ability to recover known marker genes**: That is, whether the method can successfully identify marker genes annotated by experts. - **Predictive performance**: The predictive ability of the selected gene sets. - **Memory usage and speed**: The computational efficiency of the method. - **Implementation quality**: The quality of the software implementation of the method. 3. **Analysis of method characteristics**: Analyze the characteristics of marker genes selected by different methods, including gene expression levels, the proportion of up - regulated or down - regulated genes, etc. 4. **Method consistency**: Evaluate the consistency of marker genes selected among different methods. For example, the paper mentions that there is a low overlap between two commonly used methods, Scanpy and Seurat, and even their rankings also differ significantly. 5. **Recommend best practices**: Based on the results of the benchmark tests, provide researchers with best - practice suggestions for selecting marker genes. The paper specifically points out that simple statistical methods such as Wilcoxon rank - sum test, Student's t - test, and logistic regression perform well in selecting marker genes. Overall, this paper aims to provide a scientific basis and practical guidance for marker gene selection in scRNA - seq data analysis through comprehensive benchmark tests.