A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study

Xiao Liang,Lijie Cao,Hao Chen,Lidan Wang,Yangyun Wang,Lijuan Fu,Xiaqin Tan,Enxiang Chen,Yubin Ding,Jing Tang
DOI: https://doi.org/10.1093/bib/bbad497
IF: 9.5
2024-01-12
Briefings in Bioinformatics
Abstract:Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The paper aims to address the issues of cell clustering and identification in single-cell transcriptomics research. Specifically, the paper evaluates the performance of seven state-of-the-art clustering algorithms (including four deep learning-based clustering algorithms as well as the commonly used Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden’s algorithm (CosTaL), and Single-cell consensus clustering (SC3)) on single-cell RNA sequencing (scRNA-seq) data. The paper systematically evaluates these algorithms using multiple evaluation metrics and concludes that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC), and SC3 consistently perform well on most benchmark datasets. In particular, CosTaL and DESC excel in clustering specific cell types. Additionally, SC3 performs relatively poorly in terms of memory usage and computational speed. The findings provide useful guidance for selecting appropriate scRNA-seq data analysis methods. In short, the main objective of the paper is to improve cell clustering and identification in single-cell transcriptomics research by comparing different clustering algorithms.