scSRL: Siamese Representation Learning-based method for analyzing single-cell RNA-seq data.

Zhaoyang Sun,Ying Liu,Pei Liu,Wanwan Shi,Jiawei Luo
DOI: https://doi.org/10.1109/BIBM58861.2023.10385561
2023-01-01
Abstract:Single-cell RNA sequencing (scRNA-seq) technology is utilized to analyze cellular heterogeneity, perform cellular-level biological research and derive novel insights from complex cellular systems. However, the raw scRNA-seq data is not directly suitable for downstream task analysis due to its high variability, sparsity and dimensionality. Therefore, in this study, we propose a new self-supervised framework based on siamese representation learning, named scSRL which can fully explore the intrinsic properties of cells by maximizing the similarity between positive pairs. These positive pairs are constructed by multiple data augmentation operations to further increase data diversity and better learn latent representation. Moreover, our method employs a gradient stopping strategy to mitigate collapsing in the siamese network. It is worth noting that the scSRL focuses on aggregating cells with similar functions without introducing negative samples, which can avoid additional computational cost. Finally, We evaluated scSRL on 10 real datasets for downstream tasks such as clustering, classification and visualization, and it consistently exhibited outstanding performance in all these fundamental tasks. Meanwhile, we did pseudotime inference experiments in two embryonic development datasets, and the scSRL model can accurately reconstruct cell trajectory and describe cell developmental process. scSRL is currently an open-source method, available at https://github.com/zysun17/scSRL.
What problem does this paper attempt to address?