Abstract:Background Single-cell RNA sequencing (scRNA-seq) technology provides an effective way to study cell heterogeneity. However, due to the low capture efficiency and stochastic gene expression, scRNA-seq data often contains a high percentage of missing values. It has been showed that the missing rate can reach approximately 30% even after noise reduction. To accurately recover missing values in scRNA-seq data, we need to know where the missing data is; how much data is missing; and what are the values of these data. Methods To solve these three problems, we propose a novel model with a hybrid machine learning method, namely, missing imputation for single-cell RNA-seq (MISC). To solve the first problem, we transformed it to a binary classification problem on the RNA-seq expression matrix. Then, for the second problem, we searched for the intersection of the classification results, zero-inflated model and false negative model results. Finally, we used the regression model to recover the data in the missing elements. Results We compared the raw data without imputation, the mean-smooth neighbor cell trajectory, MISC on chronic myeloid leukemia data (CML), the primary somatosensory cortex and the hippocampal CA1 region of mouse brain cells. On the CML data, MISC discovered a trajectory branch from the CP-CML to the BC-CML, which provides direct evidence of evolution from CP to BC stem cells. On the mouse brain data, MISC clearly divides the pyramidal CA1 into different branches, and it is direct evidence of pyramidal CA1 in the subpopulations. In the meantime, with MISC, the oligodendrocyte cells became an independent group with an apparent boundary. Conclusions Our results showed that the MISC model improved the cell type classification and could be instrumental to study cellular heterogeneity. Overall, MISC is a robust missing data imputation model for single-cell RNA-seq data.

Scsagan: A Scrna-Seq Data Imputation Method Based on Semi-Supervised Learning and Probabilistic Latent Semantic Analysis

scGGAN: single-cell RNA-seq imputation by graph-based generative adversarial network

Imputation in Scrna-seq Data Using Supervised Deep Generative Networks

High-throughput Single-Cell RNA-seq Data Imputation and Characterization with Surrogate-Assisted Automated Deep Learning

scCAN: Clustering With Adaptive Neighbor-Based Imputation Method for Single-Cell RNA-Seq Data

scMultiGAN: cell-specific imputation for single-cell transcriptomes with multiple deep generative adversarial networks

Cellular Similarity based Imputation for Single cell RNA Sequencing Data

scCGImpute: An Imputation Method for Single-Cell RNA Sequencing Data Based on Similarities between Cells and Relationships among Genes

scGCL: an imputation method for scRNA-seq data based on graph contrastive learning

Scssa:A Clustering Method for Single Cell Rna-Seq Data Based on Semi-Supervised Autoencoder

AdImpute: An Imputation Method for Single-Cell RNA-Seq Data Based on Semi-Supervised Autoencoders

An efficient scRNA-seq dropout imputation method using graph attention network

Scimc: a Platform for Benchmarking Comparison and Visualization Analysis of Scrna-Seq Data Imputation Methods.

Scwmc: Weighted Matrix Completion-Based Imputation of Scrna-Seq Data Via Prior Subspace Information

NISC: Neural Network-Imputation for Single-Cell RNA Sequencing and Cell Type Clustering

MISC: missing imputation for single-cell RNA sequencing data

AGImpute: imputation of scRNA-seq data based on a hybrid GAN with dropouts identification

SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data

SAE-Impute: imputation for single-cell data via subspace regression and auto-encoders

Scsemigan: a Single-Cell Semi-Supervised Annotation and Dimensionality Reduction Framework Based on Generative Adversarial Network

Imputing Single-Cell RNA-seq Data by Combining Graph Convolution and Autoencoder Neural Networks