Protein Function Prediction With Functional and Topological Knowledge of Gene Ontology

Yingwen Zhao,Zhihao Yang,Yongkai Hong,Yumeng Yang,Lei Wang,Yin Zhang,Hongfei Lin,Jian Wang
DOI: https://doi.org/10.1109/TNB.2023.3278033
Abstract:Gene Ontology (GO) is a widely used bioinformatics resource for describing biological processes, molecular functions, and cellular components of proteins. It covers more than 5000 terms hierarchically organized into a directed acyclic graph and known functional annotations. Automatically annotating protein functions by using GO-based computational models has been an area of active research for a long time. However, due to the limited functional annotation information and complex topological structures of GO, existing models cannot effectively capture the knowledge representation of GO. To solve this issue, we present a method that fuses the functional and topological knowledge of GO to guide protein function prediction. This method employs a multi-view GCN model to extract a variety of GO representations from functional information, topological structure, and their combinations. To dynamically learn the significance weights of these representations, it adopts an attention mechanism to learn the final knowledge representation of GO. Furthermore, it uses a pre-trained language model (i.e., ESM-1b) to efficiently learn biological features for each protein sequence. Finally, it obtains all predicted scores by calculating the dot product of sequence features and GO representation. Our method outperforms other state-of-the-art methods, as demonstrated by the experimental results on datasets from three different species, namely Yeast, Human and Arabidopsis. Our proposed method's code can be accessed at: https://github.com/Candyperfect/Master.
What problem does this paper attempt to address?