Cell-type composition analysis of scRNA-seq data with deep convolution neural network

He Ma,Zhihao Fang,Zongbin Liu,Yan Chen
DOI: https://doi.org/10.21203/rs.3.rs-71522/v1
2020-10-05
Abstract:Abstract BackgroundWith the rapid development of single-cell RNA sequencing (scRNA-seq), more large-scale single-cell sequencing data has been generated. Due to the continuous increase of single-cell sequencing data, the analysis of cell-type composition from single-cell transcriptomics has also to face huge challenges. Since the emergence of scRNA-seq technology, the size of sequencing datasets has grown more than 1 million times in just over a decade. Meanwhile, as more gene markers are discovered, the data dimension of single-cell sequencing becomes higher. All of these put forward more stringent requirements on data dimensionality reduction and clustering algorithms. Under the constraints of practical factors such as occurrence of noise and dropouts and the limitation of overhead, it is also required an effective and effcient method that can obtain accurate analysis results in a very short time, and has a competitive algorithm stability.ResultsWe present scCAE, an effective and effcient method based on convolution autoencoder that can accurately and rapidly analyze cell-type composition from single-cell transcriptomics datasets. Our method achieved the best results in the data sets that simulate the cell differentiation process among existing methods, which achieved the ARI of 69.64% and 68.83% at 10 and 25 clusters tasks. And, in the case of different dropouts, our method also works well. When the sparsity level of data metric is 71%, scCAE can achieved the ARI of 45.29%, which is the highest of the existing methods. In terms of algorithm overhead, our method has also achieved good results by comparing with several existing methods. It takes less time than most methods and takes up much less memory than other algorithms based neural networks.ConclusionsOur method, scCAE, has more accurate and reasonable results in the analysis of cell-types composition. And, because of the design of imputer, it can deal with a large number of dropouts in the data matrix. Because of the structure of convolution network, scCAE has less time and space overhead than other deep-learning-based methods. Thus, we demonstrate that scCAE is a competitive method for analysis of cell-type composition from scRNA-seq data. We expect that our study can be a stepping stone for further prosperity of single-cell transcriptomics analysis.
What problem does this paper attempt to address?