Improved prediction of ligand-protein binding affinities by meta-modeling

Ho-Joon Lee,Prashant S. Emani,Mark B. Gerstein
2024-10-21
Abstract:The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling approaches have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures, while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. We further demonstrate improved generalization capability by our models using a large-scale benchmark of affinity prediction as well as a virtual screening application benchmark. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain meaningful improvement in binding affinity prediction.
Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The paper attempts to address the problem of accurately screening the binding affinity between candidate drug ligands and target proteins in drug development through computational methods. Specifically, the authors aim to construct a meta-model framework by integrating classical empirical scoring functions (based on physical force fields) and sequence-based deep learning models to improve the accuracy of predicting ligand-protein binding affinity. This effort aims to overcome the inconsistent performance of existing models on different targets and to reduce model-specific biases by combining different modeling approaches, thereby achieving significant improvements in binding affinity prediction. The paper demonstrates that many of their meta-models can significantly improve affinity prediction by evaluating combinations of various individual base models, training databases, and several meta-model methods. In some cases, their performance can even rival state-of-the-art deep learning tools that are entirely based on 3D structures. Additionally, they show that these models have enhanced generalization capabilities in large-scale affinity prediction benchmarks and virtual screening application benchmarks. Overall, this study demonstrates how combining different modeling approaches can lead to meaningful improvements in binding affinity prediction.