Machine-Learning Model for Predicting the Rate Constant of Protein-Ligand Dissociation

Minyi Su,Huisi Liu,Haixia Lin,Renxiao Wang
DOI: https://doi.org/10.3866/pku.whxb201907006
2020-01-01
Acta Physico-Chimica Sinica
Abstract:An increasing number of recent studies have shown that the binding kinetics of a drug molecule to its target correlates strongly with its efficacy in vivo. Therefore, ligand optimization oriented to improved binding kinetics provides new ideas for rational drug design. Currently, ligand binding kinetics is modeled mainly through extensive molecular dynamics simulations, which limits its application to real-world problems. The present study aimed at obtaining a general-purpose quantitative structure-kinetics relationship (QSKR) model for predicting the dissociation rate constant (k(off)) of a ligand based on its complex structure. This type of model is expected to be suitable for high-throughput tasks in structure-based drug design. We collected the experimentally measured koff values for 406 ligand molecules from literature, and then constructed a three-dimensional structural model for each protein-ligand complex through molecular modeling. A training set was compiled using 60% of those complexes while the remaining 40% were assigned to two test sets. Based on distance-dependent protein-ligand atom pair descriptors, a random forest algorithm was adopted to derive a QSKR model. Various random forest models were then generated based on the descriptor sets obtained under different conditions, such as distance cutoff, bin width, and feature selection criteria. The cross-validation results of those models were then examined. It was observed that the optimal model was obtained when the distance cutoff was 15 angstrom (1 angstrom = 0.1 nm), the bin width was 3 angstrom, and feature selection variance level was 2. The final QSKR model produced correlation coefficients around 0.62 on the two independent test sets. This level of accuracy is at least comparable to that of the predictive models described in literature, which are typically computationally much more expensive. Our study attempts to address the issue of predicting k(off) values in drug design. We hope that it can provide inspiration for further studies by other researchers.
What problem does this paper attempt to address?