Leveraging data-driven self-consistency for high-fidelity gene expression recovery

Md Tauhidul Islam,Jen-Yeu Wang,Hongyi Ren,Xiaomeng Li,Masoud Badiei Khuzani,Shengtian Sang,Lequan Yu,Liyue Shen,Wei Zhao,Lei Xing
DOI: https://doi.org/10.1038/s41467-022-34595-w
IF: 16.6
2022-11-21
Nature Communications
Abstract:Abstract Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques.
multidisciplinary sciences
What problem does this paper attempt to address?