DFpin: Deep Learning-Based Protein-Binding Site Prediction with Feature-Based Non-Redundancy from RNA Level.

Xiujuan Zhao,Yanping Zhang,Xiuquan Du
DOI: https://doi.org/10.1016/j.compbiomed.2022.105216
IF: 7.7
2022-01-01
Computers in Biology and Medicine
Abstract:The interaction between proteins and RNA is closely related to various human diseases. Computer-aided drug design can be facilitated by detecting the RNA sites that bind proteins. However, due to the aggregation of binding sites in RNA sequences, high sample similarity occurs when extracting RNA fragments by using a sliding window. Considering these problems, we present a method, DFpin, to predict protein-interacting nucleotides in RNA. To retain more key nucleotide sites, we used the redundancy method based on feature similarity, that is, feature redundancy is removed based on the RNA mono-nucleotide composition to maintain the diversity of RNA samples and avoid the residue of redundant data. In addition, to extract key abstract features and avoid over-fitting, we used the cascade structure of a deep forest model to predict protein-interacting nucleotides. Overall, DFpin demonstrated excellent classification with 85.4% accuracy and 93.3% area under the curve. Compared with other methods, the accuracy of DFpin was better, suggesting that feature-based redundancy removal and deep forest can help predict nucleotides of protein interactions. The source code and all dataset are available at: https://github.com/zhaoxj-tech/DFpin.git.
What problem does this paper attempt to address?