Prediction of RNA-binding residues in proteins using random forest

Xin Ma,Jing Guo,Xiao Sun
DOI: https://doi.org/10.3969/j.issn.1001-0505.2012.01.010
2012-01-01
Abstract:A prediction method is proposed for predicting RNA-binding residues in protein sequences using a variety of features from amino acid sequence information with random forest (RF) algorithm. A novel feature, named position specific scoring matrix combing with physicochemical properties (PSSM-PP), is proposed to represent the conservation information and physicochemical properties of residues. Then the novel feature, the secondary structure information and orthogonal binary vectors are used to establish the RF model for prediction of RNA-binding residues in protein and the prediction classifier achieves 0.5336 Matthew's correlation coefficient (MCC) and 87.02% overall accuracy (ACC) with 51.16% sensitivity (SE) and 95.62% specificity (SP). The web server implementation is freely available.
What problem does this paper attempt to address?