Computational phosphorylation site prediction in plants using random forests and organism-specific instance weights

Brett Trost,Anthony Kusalik,B. Trost,A. Kusalik
DOI: https://doi.org/10.1093/bioinformatics/btt031
IF: 5.8
2013-01-22
Bioinformatics
Abstract:MOTIVATION: Phosphorylation is the most important post-translational modification in eukaryotes. Although many computational phosphorylation site prediction tools exist for mammals, and a few were created specifically for Arabidopsis thaliana, none are currently available for other plants.RESULTS: In this article, we propose a novel random forest-based method called PHOSFER (PHOsphorylation Site FindER) for applying phosphorylation data from other organisms to enhance the accuracy of predictions in a target organism. As a test case, PHOSFER is applied to phosphorylation sites in soybean, and we show that it more accurately predicts soybean sites than both the existing Arabidopsis-specific predictors, and a simpler machine-learning scheme that uses only known phosphorylation sites and non-phosphorylation sites from soybean. In addition to soybean, PHOSFER will be extended to other organisms in the near future.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?