Abstract:Accurate identification of protein function is critical to elucidate life mechanism and design new drugs. We proposed a novel deep-learning method, ATGO, to predict Gene Ontology (GO) attributes of proteins through a triplet neural-network architecture embedded with pre-trained self-attention transformer models. The method was systematically tested on 1068 non-redundant benchmarking proteins and 3328 targets from the third Critical Assessment of Protein Function Annotation (CAFA) challenge. Experimental results showed that ATGO achieved a significant increase of the GO prediction accuracy compared to the state-of-the-art approaches in all aspects of molecular function, biological process, and cellular component. Detailed data analyses showed that the major advantage of ATGO lies in the utilization of attention transformer models which can extract discriminative functional pattern from the feature embeddings. Meanwhile, the proposed triplet network helps enhance the association of functional similarity with feature similarity in the sequence embedding space. In addition, it was found that the combination of the network scores with the complementary homology-based inferences could further improve the accuracy and coverage of the predicted models. These results demonstrated a new avenue for high-accuracy deep-learning function prediction that is applicable to large-scale protein function annotations from sequence alone. Availability The benchmark dataset, standalone package, and online server for ATGO are available at <https://zhanggroup.org/ATGO/>. Author Summary In the post-genome sequencing era, a major challenge in computational molecular biology is to annotate the biological functions of all gene and gene products, which have been classified, in the context of the widely used Gene Ontology (GO), into three aspects of molecular function, biological process, and cellular component. In this work, we proposed a new open-source deep-learning architecture, ATGO, to deduce GO terms of proteins from the primary amino acid sequence, through the integration of the triplet neural-network with attention transformer models. Large benchmark tests showed that, when powered with a pre-trained self-attention transformer model, ATGO achieved a significantly improved performance than other state-of-the-art approaches in all the GO aspect predictions. Following the rapid progress of self-attention neural network techniques, which have demonstrated remarkable impacts on language processing and multi-sensory data process, and most recently on protein structure prediction, this study showed the significant potential of attention transformer models on protein function annotations. ### Competing Interest Statement The authors have declared no competing interest.

TripletGO: Integrating Transcript Expression Profiles with Protein Homology Inferences for Gene Function Prediction.

TripletGO: Integrating Transcript Expression Profiles with Protein Homology Inferences for High-Accuracy Gene Function Annotations

Integrating Unsupervised Language Model with Triplet Neural Networks for Protein Gene Ontology Prediction.

MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

GO-Diff: Mining Functional Differentiation Between EST-based Transcriptomes.

Integrating Self-Attention Transformer with Triplet Neural Networks for Protein Gene Ontology Prediction

DeepGOA: Predicting Gene Ontology Annotations of Proteins Via Graph Convolutional Network

Globally predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities.

Prot2GO: Predicting GO Annotations from Protein Sequences and Interactions.

Protein Function Prediction With Functional and Topological Knowledge of Gene Ontology

ProtGO: A Transformer based Fusion Model for accurately predicting Gene Ontology (GO) Terms from full scale Protein Sequences

Large-scale Predicting Protein Functions Through Heterogeneous Feature Fusion.

Improving Classification Accuracy Using Gene Ontology Information.

A Deep Learning Framework for Gene Ontology Annotations with Sequence- and Network-Based Information

Gene function prediction with knowledge from gene ontology

Widely Predicting Specific Protein Functions Based on Protein-Protein Interaction Data and Gene Expression Profile

PlasGO: enhancing GO-based function prediction for plasmid-encoded proteins based on genetic structure

Broadly predicting specific gene functions with expression similarity and taxonomy similarity.

NMFGO: Gene Function Prediction Via Nonnegative Matrix Factorization with Gene Ontology.

GOProFormer: A Multi-modal Transformer Method for Gene Ontology Protein Function Prediction

HashGO: Hashing Gene Ontology for Protein Function Prediction.