Dimensionality Reduction of Single-Cell RNA Sequencing Data by Combining Entropy and Denoising AutoEncoder.

Xiaoshu Zhu,Jian Li,Yongchang Lin,Liquan Zhao,Jianxin Wang,Xiaoqing Peng
DOI: https://doi.org/10.1089/cmb.2022.0118
IF: 1.549
2022-01-01
Journal of Computational Biology
Abstract:ABSTRACT Single-cell RNA sequencing (scRNA-seq) can present cellular heterogeneity at higher resolution when measuring the gene expression in an individual cell. However, there are still some computational problems in scRNA-seq data, including high dimensionality, high sparseness, and high noise. To solve them, dimensionality reduction is essential as it reduces dimensions and also removes most of the zeros and noise. Therefore, we propose a hybrid dimensionality reduction algorithm for scRNA-seq data by integrating binning-based entropy and a denoising autoencoder, named ScEDA. In ScEDA, a novel binning-based entropy estimation method is performed to select efficient genes, while removing noise. For each gene, binning-based entropy is designed to describe the differences in its expression across all cells, that is, the distribution of expression of each gene in all cells. Genes are regarded as inefficient and removed when they achieve low binning-based entropy. Moreover, by combining Kullback-Leibler (KL) divergence with the autoencoder, the objective function is reconstructed to maximize the similarity in distribution between input data and reconstructed data. Furthermore, by adding Poisson-distributed noise to the original input data, the denoising autoencoder is used to improve robustness. Compared with three other clustering methods, ScEDA provides superior average performance on 16 real scRNA-seq datasets, with obvious enhancement in large-scale datasets.
What problem does this paper attempt to address?