Protein Complexes Prediction Via Positive and Unlabeled Learning of the PPI Networks

Jichao Zhao,Xun Liang,Yi Wang,Zhiming Xu,Yu Liu
DOI: https://doi.org/10.1109/icsssm.2016.7538432
2016-01-01
Abstract:Protein complex (complex for short), is a set of proteins that interact with each other for specific biological activities. The core idea of traditional unsupervised clustering methods is finding dense subgraphs from the protein-protein interaction (PPI) network. In fact, some complexes are not dense in the network. Supervised clustering methods regard known complexes as positive cases and unknown complexes as negative cases, attempting to discover the sparse complexes hidden in the network. Unknown complex subgraphs contain many undetected complexes. Those undetected positive complexes are learned as negative cases, which affects the performance of supervised learning seriously. Therefore, supervised clustering methods are faced with the problem of PU (Positive Unlabeled), which contains only the positive cases. Complex prediction not only needs to consider the establishment of PU learning model, but also involves how to cluster. On top of this, this paper considers 22 attributes of the complex, such as the density of subgraphs, topological coefficients, the weights of edges and so on. We proposed an approach of complex prediction based on PU learning to mine complexes which cannot be found by using traditional approaches. Experiments show that our method has a higher accuracy than the traditional approaches, e.g., CFinder, CMC, MCODE and AP.
What problem does this paper attempt to address?