Scgcc: Graph Contrastive Clustering with Neighborhood Augmentations for Scrna-Seq Data Analysis.

Sheng-Wen Tian,Jian-Cheng Ni,Yu-Tian Wang,Chun-Hou Zheng,Cun-Mei Ji
DOI: https://doi.org/10.1109/jbhi.2023.3319551
IF: 7.7
2023-01-01
IEEE Journal of Biomedical and Health Informatics
Abstract:Single-cell RNA sequencing (scRNA-seq) has rapidly emerged as a powerful technique for analyzing cellular heterogeneity at the individual cell level. In the analysis of scRNA-seq data, cell clustering is a critical step in downstream analysis, as it enables the identification of cell types and the discovery of novel cell subtypes. However, the characteristics of scRNA-seq data, such as high dimensionality and sparsity, dropout events and batch effects, present significant computational challenges for clustering analysis. In this study, we propose scGCC, a novel graph self-supervised contrastive learning model, to address the challenges faced in scRNA-seq data analysis. scGCC comprises two main components: a representation learning module and a clustering module. The scRNA-seq data is first fed into a representation learning module for training, which is then used for data classification through a clustering module. scGCC can learn low-dimensional denoised embeddings, which is advantageous for our clustering task. We introduce Graph Attention Networks (GAT) for cell representation learning, which enables better feature extraction and improved clustering accuracy. Additionally, we propose five data augmentation methods to improve clustering performance by increasing data diversity and reducing overfitting. These methods enhance the robustness of clustering results. Our experimental study on 14 real-world datasets has demonstrated that our model achieves extraordinary accuracy and robustness. We also perform downstream tasks, including batch effect removal, trajectory inference, and marker genes analysis, to verify the biological effectiveness of our model.
What problem does this paper attempt to address?