Generative pretraining from large-scale transcriptomes for single-cell deciphering

Hongru Shen,Jilei Liu,Jiani Hu,Xilin Shen,Chao Zhang,Dan Wu,Mengyao Feng,Meng Yang,Yang Li,Yichen Yang,Wei Wang,Qiang Zhang,Jilong Yang,Kexin Chen,Xiangchun Li
DOI: https://doi.org/10.1016/j.isci.2023.106536
IF: 5.8
2023-04-01
iScience
Abstract:Exponential accumulation of single-cell transcriptomes poses great challenge for efficient assimilation. Here, we present an approach entitled generative pretraining from transcriptomes (<i>tGPT</i>) for learning feature representation of transcriptomes. <i>tGPT</i> is conceptually simple in that it autoregressive models the ranking of a gene in the context of its preceding neighbors. We developed <i>tGPT</i> with 22.3 million single-cell transcriptomes and used four single-cell datasets to evalutate its performance on single-cell analysis tasks. In addition, we examine its applications on bulk tissues. The single-cell clusters and cell lineage trajectories derived from <i>tGPT</i> are highly aligned with known cell labels and states. The feature patterns of tumor bulk tissues learned by <i>tGPT</i> are associated with a wide range of genomic alteration events, prognosis, and treatment outcome of immunotherapy. <i>tGPT</i> represents a new analytical paradigm for integrating and deciphering massive amounts of transcriptome data and it will facilitate the interpretation and clinical translation of single-cell transcriptomes.
multidisciplinary sciences
What problem does this paper attempt to address?