Abstract:BACKGROUND:Predicting functional properties of proteins in protein-protein interaction (PPI) networks presents a challenging problem and has important implication in computational biology. Collective classification (CC) that utilizes both attribute features and relational information to jointly classify related proteins in PPI networks has been shown to be a powerful computational method for this problem setting. Enabling CC usually increases accuracy when given a fully-labeled PPI network with a large amount of labeled data. However, such labels can be difficult to obtain in many real-world PPI networks in which there are usually only a limited number of labeled proteins and there are a large amount of unlabeled proteins. In this case, most of the unlabeled proteins may not connected to the labeled ones, the supervision knowledge cannot be obtained effectively from local network connections. As a consequence, learning a CC model in sparsely-labeled PPI networks can lead to poor performance.RESULTS:We investigate a latent graph approach for finding an integration latent graph by exploiting various latent linkages and judiciously integrate the investigated linkages to link (separate) the proteins with similar (different) functions. We develop a regularized non-negative matrix factorization (RNMF) algorithm for CC to make protein functional properties prediction by utilizing various data sources that are available in this problem setting, including attribute features, latent graph, and unlabeled data information. In RNMF, a label matrix factorization term and a network regularization term are incorporated into the non-negative matrix factorization (NMF) objective function to seek a matrix factorization that respects the network structure and label information for classification prediction.CONCLUSION:Experimental results on KDD Cup tasks predicting the localization and functions of proteins to yeast genes demonstrate the effectiveness of the proposed RNMF method for predicting the protein properties. In the comparison, we find that the performance of the new method is better than those of the other compared CC algorithms especially in paucity of labeled proteins.

Predicting Protein Functions Based on Differential Co-expression and Neighborhood Analysis.

Function Prediction For Hypothetical Proteins In Yeast Saccharomyces Cerevisiae Using Multiple Sources Of High-Throughput Data

Improving protein function prediction using domain and protein complexes in PPI networks

Prediction of Protein Function Using Protein-Protein Interaction Data

Predicting Protein Function Based on the Topological Structure of Protein Interaction Networks

Determining Protein Function by Protein-Protein Interaction Network

Global protein function prediction in protein-protein interaction networks

Finding Finer Functions for Partially Characterized Proteins by Protein-Protein Interaction Networks

An Integrated Probabilistic Model for Functional Prediction of Proteins.

Prediction of Essential Proteins Based on Subcellular Localization and Gene Expression Correlation.

Efficient and Interpretable Prediction of Protein Functional Classes by Correspondence Analysis and Compact Set Relations.

Widely Predicting Specific Protein Functions Based on Protein-Protein Interaction Data and Gene Expression Profile

A combinatorial optimization procedure for predicting protein functions

Protein Functional Properties Prediction in Sparsely-Label PPI Networks Through Regularized Non-Negative Matrix Factorization.

Characterizing Proteins with Finer Functions: A Case Study for Translational Functions of Yeast Proteins

A Network-Based Approach for Protein Functions Prediction Using Locally Linear Embedding

Protein Function Prediction With Functional and Topological Knowledge of Gene Ontology

Prediction of Protein Functions from Protein-Protein Interaction Data Based on a New Measure of Network Betweenness

A novel function prediction approach using protein overlap networks

Inferring Protein Function by Domain Context Similarities in Protein-Protein Interaction Networks

Identifying Novel Protein Phenotype Annotations by Hybridizing Protein–protein Interactions and Protein Sequence Similarities