IBRAP: Integrated Benchmarking Single-cell RNA-sequencing Analytical Pipeline

Knight,C. H.,Khan,F.,Gill,U.,Wang,J.
DOI: https://doi.org/10.1101/2022.09.26.509481
2022-09-28
bioRxiv
Abstract:Single-cell RNA-sequencing (scRNA-seq) is a powerful tool to study cellular heterogeneity. The high dimensional data generated from this technology are complex and require specialised expertise for analysis and interpretation. The core of scRNA-seq data analysis contains several key analytical steps, which include pre-processing, QC, normalisation, dimensionality reduction, integration, and clustering. Each step often has many algorithms developed with varied underlying assumptions and implications. With such a diverse choice of tools available, benchmarking analyses have compared their performances and demonstrated that tools differentially operate according to the data types and complexity. Here, we present Integrated Benchmarking scRNA-seq Analytical Pipeline (IBRAP), a tool which contains a range of analytical components that can be interchanged throughout the pipeline alongside multiple benchmarking metrics that enables users to compare results and determine the optimal pipeline combinations for their data. We apply IBRAP to single and multi-sample integration analysis using pancreas, cell line and simulated data accompanied with ground truth cell labels, demonstrating the interchangeable and benchmarking functionality of IBRAP. Our results confirm that the optimal pipelines are dependant of individual samples and studies, further supporting the rationale and necessity of our tool. We then compare reference-based cell annotation with unsupervised analysis, both included in IBRAP, and demonstrate the superiority of the reference-based method in identifying robust major and minor cell types. Thus, IBRAP presents a valuable tool to integrate multiple samples and studies to create reference maps of normal and diseased tissues, facilitating novel biological discovery using the vast volume of scRNA-seq data available.
What problem does this paper attempt to address?