GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model
Ao Li,Shubai Chen,Xingjian He,Guihai Feng,Dechao Bu,Handong Li,Xuezhi Wang,Liqun Zhong,Ping Xu,Lina Cui,Xiaodong Yang,Chen Fang,Yuanchun Zhou,Yaning Li,Zhongming Liang,Hongmei Wang,Jingtao Guo,Qingqing Long,Jing Liu,Z. Meng,Fei Li,Wentao Cui,Zhaohui Yang,Zheng Li,Jie Jiang,Zichen Wang,Yana Liu,Yong Wang,Zhenpeng Man,Hefan Miao,Jiahao Zhang,Shihua Zhang,Ge Yang,Yangang Wang,Qinmeng Yang,Zijian Wang,Pengfei Wang,Jiaheng Zhou,Haiping Jiang,Yi Zhao,Guole Liu,Jingxi Dong,Yiyang Zhang,Zhilong Hu,Yiqiang Chen,Xin Li,Zixu Deng,Yao Tian
DOI: https://doi.org/10.1101/2023.09.26.559542
2023-09-28
bioRxiv
Abstract:Deciphering the universal gene regulatory mechanisms in diverse organisms holds great potential to advance our knowledge of fundamental life process and facilitate research on clinical applications. However, the traditional research paradigm primarily focuses on individual model organisms, resulting in limited collection and integration of complex features on various cell types across species. Recent breakthroughs in single-cell sequencing and advancements in deep learning techniques present an unprecedented opportunity to tackle this challenge. In this study, we developed GeneCompass, the first knowledge-informed, cross-species foundation model pre-trained on an extensive dataset of over 120 million single-cell transcriptomes from human and mouse. During pre-training, GeneCompass effectively integrates four types of biological prior knowledge to enhance the understanding of gene regulatory mechanisms in a self-supervised manner. Fine-tuning towards multiple downstream tasks, GeneCompass outperforms competing state-of-the-art models in multiple tasks on single species and unlocks new realms of cross-species biological investigation. Overall, GeneCompass marks a milestone in advancing knowledge of universal gene regulatory mechanisms and accelerating the discovery of key cell fate regulators and candidate targets for drug development.
Biology,Computer Science