Improved gene regulatory network inference from single cell data with dropout augmentation

Hao Zhu,Donna Slonim
DOI: https://doi.org/10.1101/2023.01.26.525733
2024-10-15
Abstract:A major challenge in working with single-cell RNA sequencing data is the prevalence of "dropout", when some transcripts' expression values are erroneously not captured. Addressing this issue, which produces zero-inflated count data, is crucial for many downstream data analyses including the inference of gene regulatory networks (GRNs). In this paper, we introduce two novel contributions. First, we propose Dropout Augmentation (DA), a simple but effective model regularization method to address the zero inflation problem in single-cell data by augmenting the data with synthetic dropout events. DA offers a new perspective to solve the "dropout" problem beyond imputation. Second, we present DAZZLE, a stabilized and robust version of the autoencoder-based structure equation model for GRN inference using the DA concept. Benchmark experiments illustrate the improved performance and increased stability of the proposed DAZZLE model over existing approaches. The practical application of the DAZZLE model on a longitudinal mouse microglia dataset containing over 15,000 genes illustrates its ability to handle real-world single cell data with minimal gene filtration. The improved robustness and stability of DAZZLE make it a practical and valuable addition to the toolkit for GRN inference from single-cell data. Finally, we propose that Dropout Augmentation may have wider applications beyond the GRN-inference problem. Project website: https://bcb.cs.tufts.edu/DAZZLE.
Bioinformatics
What problem does this paper attempt to address?