scNMF-Impute: imputation for single-cell RNA-seq data based on nonnegative matrix factorization.

Juan Wang,Na-Na Zhang,Junliang Shang,Jin-Xing Liu
DOI: https://doi.org/10.1109/BIBM58861.2023.10385409
2023-01-01
Abstract:Single-cell RNA sequencing (scRNA-seq) data are collected at an unheard-of rate thanks to the advancement of high-throughput sequencing technologies. However, due to the limitations of current technology, scRNA-seq is sometimes unable to capture the expressed genes, resulting in a large number of zero counts (also known as dropout events) in the data. These dropout events can cause data loss in the gene expression matrix and severely hampers the accuracy of downstream analysis. To address this problem, in this paper, we propose a new imputation method called scNMF-impute. The scNMF-impute method imputes the dropout events and performs dimensionality reduction under the framework of nonnegative matrix factorization (NMF). To effectively identify the location of the dropout and recover the value of the dropout, we explicitly model the dropout events as a matrix. Therefore, the gene expression matrix without dropout is represented as the sum of the original data matrix and the dropout matrix. In addition, to reduce the influence of dropout on factorization, we introduce the similarity information between genes into the NMF model. The introduction of gene similarity information can ensure the accurate recovery of data structures obscured by dropout events in the gene expression matrix. We conducted extensive experiments on simulated datasets and real scRNA-seq datasets to verify the effectiveness of scNMF-impute and other state-of-the-art methods. The results show that scNMF-impute can accurately calculate missing data and restore true gene expression, thus improving the accuracy of existing clustering methods and obtaining more accurate cell clustering results.
What problem does this paper attempt to address?