NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations

Shaojun Wang,Ronghui You,Yunjia Liu,Yi Xiong,Shanfeng Zhu
DOI: https://doi.org/10.1016/j.gpb.2023.04.001
IF: 6.409
2023-01-01
Genomics Proteomics & Bioinformatics
Abstract:As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations (e.g., Evolutionary Scale Modelling (ESM)-1b embedding) from protein sequences based on self-supervision. We represent each protein by ESM-1b and use logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results show that LR-ESM achieves comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we develop NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at . ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?