Scmulan: a Multitask Generative Pre-Trained Language Model for Single-Cell Analysis

Haiyang Bian,Yixin Chen,Xiaomin Dong,Chen Li,Minsheng Hao,Sijie Chen,Jinyi Hu,Maosong Sun,Lei Wei,Xuegong Zhang
DOI: https://doi.org/10.1007/978-1-0716-3989-4_57
2024-01-01
Abstract:Gene expression could be perceived as a form of "cell language", with underlying regulatory mechanisms akin to biological grammar. Decoding this language is critical in understanding cellular functions and behaviors. In this study, we proposed a new pre-training paradigm by integrating rich metadata and pre-training tasks, and developed scMulan, a multitask generative pre-trained language model for single-cell analyses. scMulan can accomplish multiple tasks in zero-shot manner such as cell-type annotation, batch integration, and conditional cell generation, guided by different task prompts. scMulan is also ready to be expanded for novel tasks through fine-tuning.
What problem does this paper attempt to address?