Abstract:Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.

Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data

A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study

Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods

Analysis of Single-Cell RNA-seq Data by Clustering Approaches

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

Single-cell RNA-seq Data Clustering: A Survey with Performance Comparison Study

Single-cell RNA-seq clustering: datasets, models, and algorithms

Single-Cell Transcriptome Profiling Simulation Reveals the Impact of Sequencing Parameters and Algorithms on Clustering

Evaluating Imputation Methods for Single-Cell RNA-seq Data

CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis

Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data

A Cell Marker-Based Clustering Strategy (cmcluster) for Precise Cell Type Identification of Scrna-Seq Data

Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data

A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples

Machine learning and statistical methods for clustering single-cell RNA-sequencing data

scRNA-seq mixology: towards better benchmarking of single cell RNA-seq analysis methods

Review of Single-cell RNA-seq Data Clustering for Cell Type Identification and Characterization

Consensus-based clustering of single cells by reconstructing cell-to-cell dissimilarity

SCSMD: Single Cell Consistent Clustering based on Spectral Matrix Decomposition

A comparison of automatic cell identification methods for single-cell RNA sequencing data

Benchmarking cell-type clustering methods for spatially resolved transcriptomics data