An updated dataset and a structure‐based prediction model for protein–RNA binding affinity

Xu Hong,Xiaoxue Tong,Juan Xie,Pinyu Liu,Xudong Liu,Qi Song,Sen Liu,Shiyong Liu
DOI: https://doi.org/10.1002/prot.26503
2023-04-28
Proteins: Structure, Function and Genetics
Abstract:Understanding the process of protein–RNA interaction is essential for structural biology. The thermodynamic process is an important part to uncover the protein–RNA interaction mechanism. The regulatory networks between protein and RNA in organisms are dominated by the binding or dissociation in the cells. Therefore, determining the binding affinity for protein–RNA complexes can help us to understand the regulation mechanism of protein–RNA interaction. Since it is time‐consuming and labor‐intensive to determine the binding affinity for protein–RNA complexes by experimental methods, it is necessary and urgent to develop computational methods to predict that. To develop a binding affinity prediction model, first we update the dataset of protein–RNA binding affinity benchmark (PRBAB), which includes 145 complexes now. Second, we extract the structural features based on complex structure, and then we analyze and select the representative structural features to train the regression model. Third, we random select the subset from the PRBAB2.0 to fit the protein–RNA binding affinity determined by experiment. In the end, we tested our model on the nonredundant PDBbind dataset, and the results showed that Pearson correlation coefficient r = .57 and RMSE = 2.51 kcal/mol. The Pearson correlation coefficient achieves 0.7 while removing 5 complex structures with modified residues/nucleotides and metal ions. While testing on ProNAB, the results showed that 71.60% of the prediction achieves Pearson correlation coefficient r = .61 and RMSE = 1.56 kcal/mol with experiment values.
What problem does this paper attempt to address?