Abstract:Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.

What problem does this paper attempt to address?

The paper aims to address the issues of cell clustering and identification in single-cell transcriptomics research. Specifically, the paper evaluates the performance of seven state-of-the-art clustering algorithms (including four deep learning-based clustering algorithms as well as the commonly used Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden’s algorithm (CosTaL), and Single-cell consensus clustering (SC3)) on single-cell RNA sequencing (scRNA-seq) data. The paper systematically evaluates these algorithms using multiple evaluation metrics and concludes that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC), and SC3 consistently perform well on most benchmark datasets. In particular, CosTaL and DESC excel in clustering specific cell types. Additionally, SC3 performs relatively poorly in terms of memory usage and computational speed. The findings provide useful guidance for selecting appropriate scRNA-seq data analysis methods. In short, the main objective of the paper is to improve cell clustering and identification in single-cell transcriptomics research by comparing different clustering algorithms.

A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study

Single-cell RNA-seq Data Clustering: A Survey with Performance Comparison Study

Analysis of Single-Cell RNA-seq Data by Clustering Approaches

Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data

Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

Single-Cell Transcriptome Profiling Simulation Reveals the Impact of Sequencing Parameters and Algorithms on Clustering

Single-cell RNA-seq clustering: datasets, models, and algorithms

Machine learning and statistical methods for clustering single-cell RNA-sequencing data

Review of Single-cell RNA-seq Data Clustering for Cell Type Identification and Characterization

Deep Learning for clustering single-cell RNA-seq Data

A Cell Marker-Based Clustering Strategy (cmcluster) for Precise Cell Type Identification of Scrna-Seq Data

Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis

SCSMD: Single Cell Consistent Clustering based on Spectral Matrix Decomposition

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study

Clustering single cells: a review of approaches on high-and low-depth single-cell RNA-seq data

Robust scRNA-seq Cell Types Identification by Self-Guided Deep Clustering Network

Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods

Deep learning enables accurate clustering and batch effect removal in single-cell RNA-seq analysis

Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation

Clustering single-cell RNA-seq data with a model-based deep learning approach