Hierarchical marker genes selection in scRNA-seq analysis

Yutong Sun,Peng Qiu
DOI: https://doi.org/10.1371/journal.pcbi.1012643
2024-12-13
PLoS Computational Biology
Abstract:When analyzing scRNA-seq data containing heterogeneous cell populations, an important task is to select informative marker genes to distinguish various cell clusters and annotate the clusters with biologically meaningful cell types. In existing analysis methods and pipelines, marker genes are typically identified using a one-vs-all strategy, examining differential expression between one cell cluster versus the combination of all other cell clusters. However, this strategy applied to cell clusters belonging to closely related cell types often generates overlapping marker genes, which capture the common signature of closely related cell clusters but provide limited information for distinguishing them. To address the limitations of the one-vs-all strategy, we propose a hierarchical marker gene selection strategy that groups similar cell clusters and selects marker genes in a hierarchical manner. This strategy is able to improve the accuracy and interpretability of cell type identification in single-cell RNA-seq data. In the analysis and interpretation of scRNA-seq data, one important step is to identify marker genes to annotate cell clusters with the biologically meaningful names. Existing marker gene selection methods typically perform differential expression between one cell cluster versus all other clusters combined. Ideally, marker genes for one cell cluster should be highly expressed in the cell cluster and lowly expressed in the other cell clusters. However, when there exist cell clusters that correspond to closely related cell types, the one-vs-all approach often introduces overlapping marker genes that represent the commonality among the closely related cell types but provide limited information to interpret their differences. Here we organize cell clusters in a hierarchical manner, and define marker genes at all levels of the hierarchy. Our approach provide marker genes not only for individual clusters but also for lineages defined by closely related clusters. The proposed hierarchical marker genes are able to better separate cell types and better facilitate cell type annotation across datasets in those biological contexts.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?