Single-cell RNA-seq Data Clustering: A Survey with Performance Comparison Study

Ruiyi Li,Jihong Guan,Shuigeng Zhou
DOI: https://doi.org/10.1142/s0219720020400053
2020-01-01
Journal of Bioinformatics and Computational Biology
Abstract:Clustering analysis has been widely applied to single-cell RNA-sequencing (scRNA-seq) data to discover cell types and cell states. Algorithms developed in recent years have greatly helped the understanding of cellular heterogeneity and the underlying mechanisms of biological processes. However, these algorithms often use different techniques, were evaluated on different datasets and compared with some of their counterparts usually using different performance metrics. Consequently, there lacks an accurate and complete picture of their merits and demerits, which makes it difficult for users to select proper algorithms for analyzing their data. To fill this gap, we first do a review on the major existing scRNA-seq data clustering methods, and then conduct a comprehensive performance comparison among them from multiple perspectives. We consider 13 state of the art scRNA-seq data clustering algorithms, and collect 12 publicly available real scRNA-seq datasets from the existing works to evaluate and compare these algorithms. Our comparative study shows that the existing methods are very diverse in performance. Even the top-performance algorithms do not perform well on all datasets, especially those with complex structures. This suggests that further research is required to explore more stable, accurate, and efficient clustering algorithms for scRNA-seq data.
What problem does this paper attempt to address?