Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding

Jinxin Xie,Shanshan Ruan,Mingyan Tu,Zhen Yuan,Jianguo Hu,Honglin Li,Shiliang Li
DOI: https://doi.org/10.1038/s41388-024-03074-5
IF: 8.756
2024-06-05
Oncogene
Abstract:Single-cell transcriptome sequencing (scRNA-seq) is a high-throughput technique used to study gene expression at the single-cell level. Clustering analysis is a commonly used method in scRNA-seq data analysis, helping researchers identify cell types and uncover interactions between cells. However, the choice of a robust similarity metric in the clustering procedure is still an open challenge due to the complex underlying structures of the data and the inherent noise in data acquisition. Here, we propose a deep clustering method for scRNA-seq data called scRISE ( scR NA-seq I terative S moothing and self-supervised discriminative E mbedding model) to resolve this challenge. The model consists of two main modules: an iterative smoothing module based on graph autoencoders designed to denoise the data and refine the pairwise similarity in turn to gradually incorporate cell structural features and enrich the data information; and a self-supervised discriminative embedding module with adaptive similarity threshold for partitioning samples into correct clusters. Our approach has shown improved quality of data representation and clustering on seventeen scRNA-seq datasets against a number of state-of-the-art deep learning clustering methods. Furthermore, utilizing the scRISE method in biological analysis against the HNSCC dataset has unveiled 62 informative genes, highlighting their potential roles as therapeutic targets and biomarkers.
oncology,genetics & heredity,biochemistry & molecular biology,cell biology
What problem does this paper attempt to address?