cnnImpute: missing value recovery for single cell RNA sequencing data

Wenjuan Zhang,Brandon Huckaby,John Talburt,Sherman Weissman,Mary Qu Yang
DOI: https://doi.org/10.1038/s41598-024-53998-x
IF: 4.6
2024-02-18
Scientific Reports
Abstract:The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized our ability to explore cellular diversity and unravel the complexities of intricate diseases. However, due to the inherently low signal-to-noise ratio and the presence of an excessive number of missing values, scRNA-seq data analysis encounters unique challenges. Here, we present cnnImpute, a novel convolutional neural network (CNN) based method designed to address the issue of missing data in scRNA-seq. Our approach starts by estimating missing probabilities, followed by constructing a CNN-based model to recover expression values with a high likelihood of being missing. Through comprehensive evaluations, cnnImpute demonstrates its effectiveness in accurately imputing missing values while preserving the integrity of cell clusters in scRNA-seq data analysis. It achieved superior performance in various benchmarking experiments. cnnImpute offers an accurate and scalable method for recovering missing values, providing a useful resource for scRNA-seq data analysis.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper primarily addresses the prevalent issue of missing values in single-cell RNA sequencing (scRNA-seq) data. Specifically, due to technical limitations and biological heterogeneity, there are a large number of missing values (dropout) in scRNA-seq data, which poses challenges for subsequent data analysis. To solve this problem, the research team developed a new method called cnnImpute. cnnImpute is a method based on Convolutional Neural Networks (CNN) designed to recover missing expression values in scRNA-seq data. The method first estimates the dropout probability and constructs a CNN model to recover those expression values with high dropout probability. Through comprehensive evaluation, cnnImpute not only accurately fills in the missing values but also maintains the integrity of cell clusters, indicating its superior performance in various benchmark experiments. Compared to other existing imputation methods, such as ALAR, bayNorm, DCA, etc., cnnImpute has been tested on multiple real and simulated datasets and has shown excellent performance in terms of accuracy, cell type detection, and differential expression gene analysis. Additionally, cnnImpute significantly improves the quality of downstream analyses, such as enhancing the results of cell clustering analysis and better identifying truly differentially expressed genes in differential expression analysis. In summary, cnnImpute provides an accurate and scalable solution to the problem of missing values in scRNA-seq data, helping to improve the quality of analysis for this type of data and offering a valuable resource for single-cell RNA sequencing data analysis.