A novel batch-effect correction method for scRNA-seq data based on Adversarial Information Factorization

Lily Monnier,Paul-Henry Cournède
DOI: https://doi.org/10.1371/journal.pcbi.1011880
2024-02-23
PLoS Computational Biology
Abstract:Single-cell RNA sequencing (scRNA-seq) technology produces an unprecedented resolution at the level of a unique cell, raising great hopes in medicine. Nevertheless, scRNA-seq data suffer from high variations due to the experimental conditions, called batch effects, preventing any aggregated downstream analysis. Adversarial Information Factorization provides a robust batch-effect correction method that does not rely on prior knowledge of the cell types nor a specific normalization strategy while being adapted to any downstream analysis task. It compares to and even outperforms state-of-the-art methods in several scenarios: low signal-to-noise ratio, batch-specific cell types with few cells, and a multi-batches dataset with imbalanced batches and batch-specific cell types. Moreover, it best preserves the relative gene expression between cell types, yielding superior differential expression analysis results. Finally, in a more complex setting of a Leukemia cohort, our method preserved most of the underlying biological information for each patient while aligning the batches, improving the clustering metrics in the aggregated dataset. Single-cell RNA sequencing captures the signal of individual cells, allowing a finer resolution than bulk sequencing, which is particularly important for studies comprising rare populations like tumor heterogeneity or lineage tracing studies. However, it is sensitive to the experimental conditions, which induce a bias in the data, called batch effects. Those technical variations hinder any aggregated analysis, limiting scRNA-seq to individual trials. To address this issue, we developed a novel Deep-Learning method called Adversarial Information Factorization, which aims at factorizing the batch effects from the biological signal to align the individual trials for downstream aggregated analysis. The model is trained to learn the batch-conditional cells' distributions and then corrects batch effects by projecting all cells onto the same batch distribution.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?