How to make machine learning scoring functions competitive with FEP

Philip Biggin,Matthew Warren,ísak Valsson,Charlotte Deane,Aniket Magarkar,Garrett Morris
DOI: https://doi.org/10.26434/chemrxiv-2024-bth5z
2024-06-24
Abstract:Machine learning offers a promising approach for fast and accurate binding affin- ity predictions. However, current models often fail to generalise beyond their training data and are not robustly evaluated on a diverse range of benchmarks, limiting their application in drug discovery projects. In this work, we address these issues by intro- ducing a novel graph neural network model called AEV-PLIG (Atomic Environment Vector - Protein Ligand Interaction Graph), which encodes protein-ligand interactions via atomic environment vectors to improve generalisation. We evaluate our model on improved benchmarks, including our new out-of-distribution test set we call OOD Test, and two alternative benchmark systems used for free energy perturbation (FEP) calculations, and highlight competitive performance of AEV-PLIG across the board. Moreover, we demonstrate how augmented data can be leveraged to enhance predic- tion accuracy, and how enriching the training data with three complexes from a con- generic series of ligands binding to a target of interest improves performance further. Altogether, we show that these strategies improve the applicability of machine learn- ing scoring functions and enable state-of-the-art performance nearing the accuracy of physics-based simulation methods—but at a fraction of their computational cost. This practical approach extends the predictive capabilities of machine learning for molecular discovery, paving the way for its broader use in computer-aided drug design.
Chemistry
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address two major issues encountered in machine learning when predicting protein-ligand binding affinity: 1. **Lack of Generalization**: Current machine learning models often fail to generalize effectively beyond their training data, which limits their application in drug discovery projects. 2. **Insufficient Evaluation**: Existing models perform poorly in diverse benchmark tests and lack robust evaluation on different distribution data. To tackle these problems, the authors introduce a new graph neural network model—AEV-PLIG (Atomic Environment Vector - Protein Ligand Interaction Graph), and evaluate its performance through an improved benchmark test set. Specifically, the authors constructed a new "OOD Test" (Out-of-Distribution Test) benchmark set to penalize models for memorizing ligands or proteins, ensuring that the models can generalize to unseen data. Additionally, they explored the use of augmented data to improve prediction accuracy, particularly in drug discovery-related benchmarks. Through these strategies, the authors demonstrate that the AEV-PLIG model can achieve performance in predicting binding affinity close to the accuracy of physical simulation methods, but with significantly lower computational costs.