On the Statistical Significance of Protein Complex

Youfu Su,Can Zhao,Zheng Chen,Bo Tian,Zengyou He
DOI: https://doi.org/10.1007/s40484-018-0153-6
2018-01-01
Quantitative Biology
Abstract:Background: Statistical validation of predicted complexes is a fundamental issue in proteomics and bioinformatics. The target is to measure the statistical significance of each predicted complex in terms of p-values. Surprisingly, this issue has not received much attention in the literature. To our knowledge, only a few research efforts have been made towards this direction. Methods: In this article, we propose a novel method for calculating the p-value of a predicted complex. The null hypothesis is that there is no difference between the number of edges in target protein complex and that in the random null model. In addition, we assume that a true protein complex must be a connected subgraph. Based on this null hypothesis, we present an algorithm to compute the p-value of a given predicted complex. Results: We test our method on five benchmark data sets to evaluate its effectiveness. Conclusions: The experimental results show that our method is superior to the state-of-the-art algorithms on assessing the statistical significance of candidate protein complexes.
What problem does this paper attempt to address?