Sce2egae: Enhancing Single-cell RNA-Seq Data Analysis Through an End-to-End Cell-Graph-Learnable Graph Autoencoder with Differentiable Edge Sampling
Shuo Wang,Yuanning Liu,Hao Zhang,Zhen Liu
DOI: https://doi.org/10.21203/rs.3.rs-5279794/v1
2024-01-01
Abstract:Background: Single-cell RNA sequencing (scRNA-Seq) technology reveals biological processes and molecular-level genomic information among individual cells. Numerous computational methods, including methods based on graph neural networks (GNNs), have been developed to enhance scRNA-Seq data analysis. However, existing GNNs-based methods usually construct fixed graphs by applying the k-nearest neighbors (KNN) algorithm, which may result in information loss. Methods: To address this problem, we propose scE2EGAE, which learns cell graphs during the training processes. Firstly, the scRNA-Seq data is fed into a deep count autoencoder (DCA). Secondly, the hidden representations of DCA are extracted and then used to generate cell-to-cell graph edges through a straight-through estimator based on top-k sampling and Gumbel-softmax. Finally, the generated cell-to-cell graph and scRNA-Seq data are fed into the GNNs-based downstream tasks. In this paper, we design a graph autoencoder which performs denoising on scRNA-Seq data as the downstream task. Results: We evaluate scE2EGAE on eight public scRNA-Seq datasets and compare its performance with seven existing scRNA-Seq data denoising methods. In this paper, extensive experiments are conducted, encompassing: 1) the evaluation of denoising performance, with metrics including mean absolute error (MAE), Pearson correlation coefficient (PCC), and cosine similarity (CS); 2) the assessment of clustering performance of the denoised results, utilizing adjusted rand index (ARI) and normalized mutual information (NMI); and 3) the evaluation of the cell trajectory inference performance of the denoised results, measured by the pseudo-temporal ordering score (POS). The results show that scE2EGAE outperforms most of the methods, proving that it can learn cell-to-cell graphs containing real information of cell-to-cell relationships. Conclusions: In this paper, we validate the proposed scE2EGAE method through its application to the denoising task of scRNA-Seq data. This method demonstrates its capability to learn inter-cellular relationships and construct cell-to-cell graphs, thereby enhancing downstream analysis of scRNA-Seq data. Our approach can serve as an inspiration for future research on scRNA-Seq analysis methods based on GNNs, holding broad application prospects.