A Flexible Network-Based Imputing-and-fusing Approach Towards the Identification of Cell Types from Single-Cell RNA-seq Data

Yang Qi,Yang Guo,Huixin Jiao,Xuequn Shang
DOI: https://doi.org/10.1186/s12859-020-03547-w
IF: 3.307
2020-01-01
BMC Bioinformatics
Abstract:BACKGROUND:Single-cell RNA sequencing (scRNA-seq) provides an effective tool to investigate the transcriptomic characteristics at the single-cell resolution. Due to the low amounts of transcripts in single cells and the technical biases in experiments, the raw scRNA-seq data usually includes large noise and makes the downstream analyses complicated. Although many methods have been proposed to impute the noisy scRNA-seq data in recent years, few of them take into account the prior associations across genes in imputation and integrate multiple types of imputation data to identify cell types.RESULTS:We present a new framework, NetImpute, towards the identification of cell types from scRNA-seq data by integrating multiple types of biological networks. We employ a statistic method to detect the noise data items in scRNA-seq data and develop a new imputation model to estimate the real values of data noise by integrating the PPI network and gene pathways. Meanwhile, based on the data imputed by multiple types of biological networks, we propose an integrated approach to identify cell types from scRNA-seq data. Comprehensive experiments demonstrate that the proposed network-based imputation model can estimate the real values of noise data items accurately and integrating the imputation data based on multiple types of biological networks can improve the identification of cell types from scRNA-seq data.CONCLUSIONS:Incorporating the prior gene associations in biological networks can potentially help to improve the imputation of noisy scRNA-seq data and integrating multiple types of network-based imputation data can enhance the identification of cell types. The proposed NetImpute provides an open framework for incorporating multiple types of biological network data to identify cell types from scRNA-seq data.
What problem does this paper attempt to address?