Sequence-Based Prediction Of Microrna-Binding Residues In Proteins Using Cost-Sensitive Laplacian Support Vector Machines

Jian-Sheng Wu,Zhi-Hua Zhou
DOI: https://doi.org/10.1109/TCBB.2013.75
2013-01-01
Abstract:the recognition of microRNA (miRNA)-binding residue's in proteins is helpful to understand how miRNAs silence their target gen. it is difficult to use existing computational method to predict miRNA-binding residues in proteins due to the lack of training examples. To address this issue, unlabeled data may be exploited to help construct a computational model. Semisupervised learning deals with methOds fOr exploiting unlabeled data in addition to labeled data automatically to improve learning performance, where no human interventiOri isasairrieci; in miRNA-binding proteins almost always contain a much smaller number of binding than nonbinding residue, and coit-SensitiVe learning has been deemed as a good solution to the class imbalance problem. In this work, a novel model is proposed for redcignizihg liniRNA-binding residues in proteins from sequences using a cost-sensitive extension of Laplacian support vector machines (CS-LapSVM) with a hybrid feature. The hybrid feature consists of evolutionary information Of the amino acid sequence (position-specific scoring matrices), the conservation information aboutthree biochemical properties (HKM) and. mutual interaction propensities in protein-miRNA complex structures. The CS-LapSVM receives good performance with an F1 score of 26.23 2.55% and an AUC value of 0.805 0.020 superior to existing approaches for the recognition of RNA-binding residues. A web server called SARS is built and freely available for academic usage.
What problem does this paper attempt to address?