Cell-Graph Compass: Modeling Single Cells with Graph Structure Foundation Model

Chen Fang,Zhilong Hu,Shaole Chang,Qingqing Long,Wentao Cui,Wenhao Liu,Cong Li,Yana Liu,Pengfei Wang,Zhen Meng,Jia Pan,Yuanchun Zhou,Guihai Feng,Linghui Chen,xin li
DOI: https://doi.org/10.1101/2024.06.04.597354
2024-06-06
Abstract:Inspired by the advancements in pre-trained Large Language Models, there has been a surge of studies in the Life Sciences focusing on constructing foundation models with large scale single-cell RNA-seq data. These studies typically involve pre-training a transformer model on large-scale single-cell sequencing data, followed by fine-tuning for a variety of downstream tasks, achieving notable performance. However, these models all share a common shortcoming: to utilize the transformer architecture, originally designed for textual data, they artificially impose a sequential structure on genes within cells, simplifying the complex interactions between genes. Furthermore, they focus solely on transcriptomic data, neglecting other relevant biological information. To address these issues, here we introduce Cell-Graph Compass (CGC), the first foundational model that leverages graph structures to model single cells and describes cells from multiple perspectives, including transcriptional profiles, gene text summaries, transcription factor regulatory networks, gene co-expression patterns, and gene positional relationships. By incorporating self-attention mechanisms, we pretrained the model on 50 million human single-cell sequencing data, resulting in a robust digital representation of cells. Extensive downstream experiments demonstrate that our approach can capture meaningful biological knowledge and achieve superior results in various problem scenarios, achieving the state-of-the-art (SOTA).
Bioinformatics
What problem does this paper attempt to address?