Abstract:Accounting for batch effects, especially latent batch effects, in differential expression (DE) analysis is critical for identifying true biological effects. Single-cell RNA sequencing (scRNA-seq) is a powerful tool for quantifying cell-to-cell variation in transcript abundance and characterizing cellular dynamics. Although many scRNA-seq DE analysis methods accommodate known batch variables, their performance has not been systematically evaluated. Moreover, the challenge of accounting for latent batch variables in scRNA-seq DE analysis is largely unmet. In contrast, many methods have been developed to account for batch variables (either known or latent) in other high-dimensional data, especially bulk RNA-seq. We extensively evaluate eleven methods for batch variables in different scRNA-seq DE analysis scenarios, with a primary focus on latent batch variables. We demonstrate that for known batch variables, incorporating them as covariates into a regression model outperformed approaches using batch-corrected matrix. For latent batches, fixed effects models have inflated FDRs, whereas aggregation-based methods and mixed effects models have significant power loss. Surrogate variable based methods generally control the FDR well while achieving good power with small group effects. However, their performance (except SVA) deteriorated substantially in scenarios involving large group effects and/or group label impurity. In these settings, SVA achieves relatively good performance despite occasionally inflated FDR (up to 0.2). Finally we make following recommendations for scRNA-seq DE analysis: 1) incorporating known batch variables instead of using batch-corrected data; 2) employing SVA for latent batch correction and 3) better methods are still needed to fully unleash the power of scRNA-seq.<ol class="links-for-figure"><li><a class="anchor download-link u-font-sans" href="https://ars.els-cdn.com/content/image/1-s2.0-S200103701930409X-ga1_lrg.jpg">Download : Download high-res image (88KB)</a></li><li><a class="anchor download-link u-font-sans" href="https://ars.els-cdn.com/content/image/1-s2.0-S200103701930409X-ga1.jpg">Download : Download full-size image</a></li></ol>

Benchmarking UMI-based single cell RNA-sequencing preprocessing workflows

Benchmarking UMI-based single-cell RNA-seq preprocessing workflows

scRNA-seq mixology: towards better benchmarking of single cell RNA-seq analysis methods

A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples

Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments

Comparative Analysis of NovaSeq 6000 and MGISEQ 2000 Single-Cell RNA Sequencing Data.

Comparison of high-throughput single-cell RNA sequencing data processing pipelines

A systematic evaluation of single cell RNA-seq analysis pipelines

Practical bioinformatics pipelines for single-cell RNA-seq data analysis

Systematic comparative analysis of single cell RNA-sequencing methods

scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing

A comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data

IBRAP: Integrated Benchmarking Single-cell RNA-sequencing Analytical Pipeline

Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data

Comparative Analysis of Commercial Single-Cell RNA Sequencing Technologies

Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance

Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods

The impact of package selection and versioning on single-cell RNA-seq analysis

Beyond benchmarking: towards predictive models of dataset-specific single-cell RNA-seq pipeline performance

Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview

A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing