CSBPI_Site:Multi-Information Sources of Features to RNA Binding Sites Prediction

Lichao Zhang,Zihong Huang,Liang Kong
DOI: https://doi.org/10.2174/1574893615666210108093950
2021-01-01
Current Bioinformatics
Abstract:Background: RNA-binding proteins establish posttranscriptional gene regulation by coordi-nating maturation, editing, transport, stability, and translation of cellular RNAs. Immunoprecipitation experiments could identify the interaction between RNA and proteins, but they are limited due to the experimental environment and material. Therefore, it is essential to construct computational models to identify the function sites. Objective: Although some computational methods have been proposed to predict RNA binding sites, the accuracy could be further improved. Moreover, it is necessary to construct a dataset with more samples to design a reliable model. Here we present a computational model based on multi-information sources to identify RNA binding sites. Methods: We construct an accurate computational model named CSBPI_Site, based on extreme gradi-ent boosting. The specifically designed 15-dimensional feature vector captures four types of infor-mation (chemical shift, chemical bond, chemical properties and position information). Results: The satisfied accuracy of 0.86 and AUC of 0.89 were obtained by leave-one-out cross-validation. Meanwhile, the accuracies were slightly different (range from 0.83 to 0.85) among the three classifiers algorithm, which showed that the novel features are stable and fit to multiple classifiers. These results showed that the proposed method is effective and robust for the identification of noncod-ing RNA binding sites. Conclusion: Our method based on multi-information sources is effective to represent the binding sites information among ncRNAs. The satisfied prediction results of Diels-Alder riboz-yme based on CSBPI_Site indicates that our model is valuable to identify the function site.
What problem does this paper attempt to address?