Proximity Graph Networks: Predicting Ligand Affinity with Message Passing Neural Networks

Michael J. Keiser,Zachary J. Gale-Day,Laura Shub,Kangway V. Chuang
DOI: https://doi.org/10.26434/chemrxiv-2024-hznxh
2024-02-23
Abstract:Message-passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein-ligand complex scoring tasks. Here, we describe the Protein-Graph Network (PGN) package, an open-source toolkit that constructs ligand-receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with Proximity Graph data structures augment the prediction of ligand-receptor complex properties when ligand-receptor data are available.
Chemistry
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the accuracy of predicting the binding affinity of protein - ligand complexes. Specifically, the authors introduce a new method named Proximity Graph Networks (PGNs), which is an open - source toolkit based on Message Passing Neural Networks (MPNNs) and is used to construct ligand - receptor graphs based on atomic proximity. This method aims to significantly improve model performance by allowing information to be passed between ligand and protein atoms during the learning process. The paper also shows the applicability of different MPNN architectures in different tasks and emphasizes the importance of the modular framework for evaluating MPNN architectures. ### Main Research Questions 1. **Improving the accuracy of binding affinity prediction**: - The paper explores how to use PGNs to improve the prediction of the binding affinity of protein - ligand complexes. By introducing a graph structure based on atomic proximity, the model can more effectively capture the interactions between ligands and receptors, thereby improving the accuracy of prediction. 2. **Evaluating the performance of different MPNN architectures**: - The authors test multiple MPNN architectures (such as PFP, DMPNN, GGNET, etc.) and evaluate them on different datasets to determine which architecture performs best on a specific task. The results show that different MPNN architectures exhibit different performance advantages in different tasks. 3. **Verifying the generalization ability of the model**: - The paper pays special attention to the generalization ability of the model on unseen receptors. The authors use the PDBbind dataset and the D4 diverse docking dataset to evaluate the generalization performance of the model. The results show that PGNs perform well in these tasks, especially on the D4 diverse docking dataset. ### Methods and Experiments - **Datasets**: - PDBbind 2019 Refined Set: It contains 4,852 high - quality ligand - receptor complexes. - PDBbind 2019 General Set: It contains 17,679 ligand - receptor complexes. - D4 Diverse Docking Set: It contains 86,452 ligands docked onto the dopamine D4 receptor. - D4 Experimental Dataset: It contains 510 ligands with experimental binding data. - **Model Evaluation**: - Use Root Mean Square Error (RMSE) and Pearson Correlation Coefficient (PCC) as evaluation metrics. - Select the best model configuration through cross - validation and hyperparameter optimization. ### Results - **PDBbind Datasets**: - The PFP encoder significantly outperforms the baseline model in the protein splitting task, indicating that the graph model has an advantage in generalization performance. - The DMPNN architecture performs well on the PDBbind General dataset, especially in the random splitting task. - **D4 Diverse Datasets**: - All graph models significantly outperform the baseline model, and in particular, the DMPNN model performs best. - The performance of the model on the similarity - split dataset is comparable to that on the random - split dataset, which may be because the dataset itself is already very diverse. ### Conclusions - **Main Contributions**: - Introduced PGNs, an open - source toolkit based on Message Passing Neural Networks, for constructing ligand - receptor graphs. - Demonstrated the performance differences of different MPNN architectures in different tasks and emphasized the importance of the modular framework. - Verified the effectiveness and generalization ability of PGNs in predicting ligand - receptor binding affinity. - **Future Work**: - Explore more diverse graph convolution methods, such as deep tensor networks. - Research different data augmentation techniques to improve model performance in low - data situations. - Apply PGNs to fields such as molecular dynamics simulation and virtual screening. Through these studies, the authors hope to provide a powerful tool for the fields of drug design and computational chemistry to more accurately predict the properties of ligand - receptor complexes.