Interrogating noise in protein sequences from the perspective of protein-protein interactions prediction.

Yongcui Wang,Xianwen Ren,Chunhua Zhang,Naiyang Deng,Xiangsun Zhang
DOI: https://doi.org/10.1016/j.jtbi.2012.09.007
IF: 2.405
2012-01-01
Journal of Theoretical Biology
Abstract:The past decades witnessed extensive efforts to study the relationship among proteins. Particularly, sequence-based protein–protein interactions (PPIs) prediction is fundamentally important in speeding up the process of mapping interactomes of organisms. High-throughput experimental methodologies make many model organism's PPIs known, which allows us to apply machine learning methods to learn understandable rules from the available PPIs. Under the machine learning framework, the composition vectors are usually applied to encode proteins as real-value vectors. However, the composition vector value might be highly correlated to the distribution of amino acids, i.e., amino acids which are frequently observed in nature tend to have a large value of composition vectors. Thus formulation to estimate the noise induced by the background distribution of amino acids may be needed during representations. Here, we introduce two kinds of denoising composition vectors, which were successfully used in construction of phylogenetic trees, to eliminate the noise. When validating these two denoising composition vectors on Escherichia coli (E. coli), Saccharomyces cerevisiae (S. cerevisiae) and human PPIs datasets, surprisingly, the predictive performance is not improved, and even worse than non-denoised prediction. These results suggest that the noise in phylogenetic tree construction may be valuable information in PPIs prediction.
What problem does this paper attempt to address?