AGImpute: imputation of scRNA-seq data based on a hybrid GAN with dropouts identification
Xiaoshu Zhu,Shuang Meng,Gaoshi Li,Jianxin Wang,Xiaoqing Peng
DOI: https://doi.org/10.1093/bioinformatics/btae068
IF: 5.8
2024-02-01
Bioinformatics
Abstract:Abstract Motivation Dropout events bring challenges in analyzing single-cell RNA sequencing data as they introduce noise and distort the true distributions of gene expression profiles. Recent studies focus on estimating dropout probability and imputing dropout events by leveraging information from similar cells or genes. However, the number of dropout events differs in different cells, due to the complex factors, such as different sequencing protocols, cell types, and batch effects. The dropout event differences are not fully considered in assessing the similarities between cells and genes, which compromises the reliability of downstream analysis. Results This work proposes a hybrid Generative Adversarial Network with dropouts identification to impute single-cell RNA sequencing data, named AGImpute. First, the numbers of dropout events in different cells in scRNA-seq data are differentially estimated by using a dynamic threshold estimation strategy. Next, the identified dropout events are imputed by a hybrid deep learning model, combining Autoencoder with a Generative Adversarial Network. To validate the efficiency of the AGImpute, it is compared with seven state-of-the-art dropout imputation methods on two simulated datasets and seven real single-cell RNA sequencing datasets. The results show that AGImpute imputes the least number of dropout events than other methods. Moreover, AGImpute enhances the performance of downstream analysis, including clustering performance, identifying cell-specific marker genes, and inferring trajectory in the time-course dataset. Availability and implementation The source code can be obtained from https://github.com/xszhu-lab/AGImpute.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology