Abstract:Motivation: Proteins play pivotal roles in biological systems, and precise prediction of their functions is indispensable for practical applications. Despite the surge in protein sequence data facilitated by high-throughput techniques, unraveling the exact functionalities of proteins still demands considerable time and resources. Currently, numerous methods rely on protein sequences for prediction, while methods targeting protein structures are scarce, often employing Convolutional Neural Networks (CNN) or Graph Convolutional Networks (GCN) individually. Results: To address these challenges, our approach starts from protein structures and proposes a method that combines CNN and GCN into a unified framework called the Two-model Adaptive Weight Fusion Network (TAWFN) for protein function prediction. First, amino acid contact maps and sequences are extracted from the protein structure. Then, the sequence is used to generate one-hot encoded features and deep semantic features. These features, along with the constructed graph, are fed into the Adaptive Graph Convolutional Networks (AGCN) module and the Multilayer Convolutional Neural Network (MCNN) module as needed, resulting in preliminary classification outcomes. Finally, the preliminary classification results are inputted into the adaptive weight computation network, where adaptive weights are calculated to fuse the initial predictions from both networks, yielding the final prediction result. To evaluate the effectiveness of our method, experiments were conducted on the PDBset and AFset datasets. For molecular function, biological process, and cellular component tasks, TAWFN achieved Area Under the Precision-Recall curve (AUPR) values of 0.718, 0.385, and 0.488 respectively, with corresponding Fmax scores of 0.762, 0.628, and 0.693, and Smin scores of 0.326, 0.483, and 0.454. The experimental results demonstrate that TAWFN exhibits promising performance, outperforming existing methods. Availability and implementation: The TAWFN source code can be found at https://github.com/ss0830/TAWFN. Supplementary information: Supplementary data are available at Bioinformatics online.

Protein-Mamba: Biological Mamba Models for Protein Function Prediction

In Silico Protein Function Prediction: the Rise of Machine Learning-Based Approaches

Predicting human protein function with multi-task deep neural networks

ProtMamba: a homology-aware but alignment-free protein state space model

Advancing Protein-DNA Binding Site Prediction: Integrating Sequence Models and Machine Learning Classifiers

Kola acuminata proanthocyanidins: a class of anti-trypanosomal compounds effective against Trypanosoma brucei.

SMILES-Mamba: Chemical Mamba Foundation Models for Drug ADMET Prediction

DeepLA: A deep learning-based model for predicting protein function from protein sequence and evolutionary information.

Protein Function Prediction: From Traditional Classifier to Deep Learning

A systematic review of state-of-the-art strategies for machine learning-based protein function prediction

Enhanced prediction of protein functional identity through the integration of sequence and structural features

pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning

Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism

Deep learning methods for protein function prediction

PROTGOAT : Improved automated protein function predictions using Protein Language Models

TAWFN: A Deep Learning Framework for Protein Function Prediction

[Advances in machine learning for predicting protein functions].

Protein Function Prediction with High-Throughput Data

BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba

Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning

ProteinMAE: Masked Autoencoder for Protein Surface Self-supervised Learning