Comparison of high-throughput single-cell RNA sequencing data processing pipelines

Mingxuan Gao,Mingyi Ling,Xinwei Tang,Shun Wang,Xu Xiao,Ying Qiao,Wenxian Yang,Rongshan Yu
DOI: https://doi.org/10.1093/bib/bbaa116
IF: 9.5
2020-07-07
Briefings in Bioinformatics
Abstract:Abstract With the development of single-cell RNA sequencing (scRNA-seq) technology, it has become possible to perform large-scale transcript profiling for tens of thousands of cells in a single experiment. Many analysis pipelines have been developed for data generated from different high-throughput scRNA-seq platforms, bringing a new challenge to users to choose a proper workflow that is efficient, robust and reliable for a specific sequencing platform. Moreover, as the amount of public scRNA-seq data has increased rapidly, integrated analysis of scRNA-seq data from different sources has become increasingly popular. However, it remains unclear whether such integrated analysis would be biassed if the data were processed by different upstream pipelines. In this study, we encapsulated seven existing high-throughput scRNA-seq data processing pipelines with Nextflow, a general integrative workflow management framework, and evaluated their performance in terms of running time, computational resource consumption and data analysis consistency using eight public datasets generated from five different high-throughput scRNA-seq platforms. Our work provides a useful guideline for the selection of scRNA-seq data processing pipelines based on their performance on different real datasets. In addition, these guidelines can serve as a performance evaluation framework for future developments in high-throughput scRNA-seq data processing.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?