Abstract:Background With the rapid accumulation of proteomic and genomic datasets in terms of genome-scale features and interaction networks through high-throughput experimental techniques, the process of manual predicting functional properties of the proteins has become increasingly cumbersome, and computational methods to automate this annotation task are urgently needed. Most of the approaches in predicting functional properties of proteins require to either identify a reliable set of labeled proteins with similar attribute features to unannotated proteins, or to learn from a fully-labeled protein interaction network with a large amount of labeled data. However, acquiring such labels can be very difficult in practice, especially for multi-label protein function prediction problems. Learning with only a few labeled data can lead to poor performance as limited supervision knowledge can be obtained from similar proteins or from connections between them. To effectively annotate proteins even in the paucity of labeled data, it is important to take advantage of all data sources that are available in this problem setting, including interaction networks, attribute feature information, correlations of functional labels, and unlabeled data. Results In this paper, we show that the underlying nature of predicting functional properties of proteins using various data sources of relational data is a typical collective classification (CC) problem in machine learning. The protein functional prediction task with limited annotation is then cast into a semi-supervised multi-label collective classification (SMCC) framework. As such, we propose a novel generative model based SMCC algorithm, called GM-SMCC, to effectively compute the label probability distributions of unannotated protein instances and predict their functional properties. To further boost the predicting performance, we extend the method in an ensemble manner, called EGM-SMCC, by utilizing multiple heterogeneous networks with various latent linkages constructed to explicitly model the relationships among the nodes for effectively propagate the supervision knowledge from labeled to unlabeled nodes. Conclusion Experimental results on a yeast gene dataset predicting the functions and localization of proteins demonstrate the effectiveness of the proposed method. In the comparison, we find that the performances of the proposed algorithms are better than the other compared algorithms.

Collective prediction of protein functions from protein-protein interaction networks

Protein Functional Properties Prediction in Sparsely-Label PPI Networks Through Regularized Non-Negative Matrix Factorization.

Protein Function Prediction by Collective Classification with Explicit and Implicit Edges in Protein-Protein Interaction Networks

Effectively Predicting Protein Functions by Collective Classification — an Extended Abstract

Improving protein function prediction using domain and protein complexes in PPI networks

Prediction of Protein Function Using Protein-Protein Interaction Data

Efficient and Interpretable Prediction of Protein Functional Classes by Correspondence Analysis and Compact Set Relations.

Prediction of Protein-Protein Interactions from Amino Acid Sequences Based on Continuous and Discrete Wavelet Transform Features.

Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier

Global protein function prediction in protein-protein interaction networks

Semi-supervised multi-label collective classification ensemble for functional genomics

Predicting Protein Functions Based on Differential Co-expression and Neighborhood Analysis.

Amalgamation of 3D structure and sequence information for protein–protein interaction prediction

Active Learning for Protein Function Prediction in Protein-Protein Interaction Networks

Predicting Protein-Protein Interactions Based on Ensemble Learning-Based Model from Protein Sequence

Predicting Protein Complexes Via the Integration of Multiple Biological Information

Inferring Protein Function by Domain Context Similarities in Protein-Protein Interaction Networks

Protein–protein interaction prediction based on ordinal regression and recurrent convolutional neural networks

Prediction Of Protein-Protein Interactions Using Subcellular And Functional Localizations

An Integrated Probabilistic Model for Functional Prediction of Proteins.

From function to interaction: a new paradigm for accurately predicting protein complexes based on protein-to-protein interaction networks