Abstract:Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at https://github.com/liuhong-jia/scAnno.

A comparison of scRNA-seq annotation methods based on experimentally labeled immune cell subtype dataset

A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq

A comparison of automatic cell identification methods for single-cell RNA sequencing data

A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data

scAnno: a deconvolution strategy-based automatic cell type annotation tool for single-cell RNA-sequencing data sets

Predicting cell types with supervised contrastive learning on cells and their types

Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction

Automated cell annotation in scRNA-seq data using unique marker gene sets

Evaluating Imputation Methods for Single-Cell RNA-seq Data

Automatic Cell Type Annotation Using Marker Genes for Single-Cell RNA Sequencing Data

Systematic comparative analysis of single cell RNA-sequencing methods

scRCA: a Siamese network-based pipeline for the annotation of cell types using imperfect single-cell RNA-seq reference data

A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study

Knowledge-based classification of fine-grained immune cell types in single-cell RNA-Seq data

Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation

Reference-free Cell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network

scEM: A New Ensemble Framework for Predicting Cell Type Composition Based on scRNA-Seq Data

scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data

scDeepInsight: a supervised cell-type identification method for scRNA-seq data with deep learning

scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network

Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data