GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model

Xiaodong Yang,Guole Liu,Guihai Feng,Dechao Bu,Pengfei Wang,Jie Jiang,Shubai Chen,Qinmeng Yang,Hefan Miao,Yiyang Zhang,Zhenpeng Man,Zhongming Liang,Zichen Wang,Yaning Li,Zheng Li,Yana Liu,Yao Tian,Wenhao Liu,Cong Li,Ao Li,Jingxi Dong,Zhilong Hu,Chen Fang,Lina Cui,Zixu Deng,Haiping Jiang,Wentao Cui,Jiahao Zhang,Zhaohui Yang,Handong Li,Xingjian He,Liqun Zhong,Jiaheng Zhou,Zijian Wang,Qingqing Long,Ping Xu,X-Compass Consortium,Hongmei Wang,Zhen Meng,Xuezhi Wang,Yangang Wang,Yong Wang,Shihua Zhang,Jingtao Guo,Yi Zhao,Yuanchun Zhou,Fei Li,Jing Liu,Yiqiang Chen,Ge Yang,Xin Li,Baoyang Hu,Wei Li,Fei Gao,Leqian Yu,Qi Gu,Weiwei Zhai,Zhengting Zou,Jingqi Yu,Wenhui Wu,Xinxin Lin,Yu Zou,Yongshun Ren,Fan Li,Yixiao Zhao,Yike Xin,Longfei Han,Shuyang Jiang,Kai Ma,Qicheng Chen,Haoyuan Wang,Huanhuan Wu,Chaofan He,Yilong Hu,Shuyu Guo,Yiyun Li,Zaitian Wang,Huimin He,Shan Zong,Jiajia Wang,Yan Chen,Chunyang Zhang,Chengrui Wang,Ran Zhang,Meng Xiao,Yining Wang,Xin Qin,Jiaxin Qin,Chenhao Li,Zhufeng Xu,Zeyuan Zhang,Xiaoning Qi,Wuliang Huang,Yaoru Luo,Qinxuan Luo,Ziwen Liu,Teng Wang,Yiming Huang,Shirui Li,Kangning Dong,Qunlun Shen
DOI: https://doi.org/10.1038/s41422-024-01034-y
2024-10-08
Abstract:Deciphering universal gene regulatory mechanisms in diverse organisms holds great potential for advancing our knowledge of fundamental life processes and facilitating clinical applications. However, the traditional research paradigm primarily focuses on individual model organisms and does not integrate various cell types across species. Recent breakthroughs in single-cell sequencing and deep learning techniques present an unprecedented opportunity to address this challenge. In this study, we built an extensive dataset of over 120 million human and mouse single-cell transcriptomes. After data preprocessing, we obtained 101,768,420 single-cell transcriptomes and developed a knowledge-informed cross-species foundation model, named GeneCompass. During pre-training, GeneCompass effectively integrated four types of prior biological knowledge to enhance our understanding of gene regulatory mechanisms in a self-supervised manner. By fine-tuning for multiple downstream tasks, GeneCompass outperformed state-of-the-art models in diverse applications for a single species and unlocked new realms of cross-species biological investigations. We also employed GeneCompass to search for key factors associated with cell fate transition and showed that the predicted candidate genes could successfully induce the differentiation of human embryonic stem cells into the gonadal fate. Overall, GeneCompass demonstrates the advantages of using artificial intelligence technology to decipher universal gene regulatory mechanisms and shows tremendous potential for accelerating the discovery of critical cell fate regulators and candidate drug targets.
What problem does this paper attempt to address?