Improving generalisability of 3D binding affinity models in low data regimes

Julia Buhmann,Ward Haddadin,Lukáš Pravda,Alan Bilsland,Hagen Triendl

2024-09-19

Abstract:Predicting protein-ligand binding affinity is an essential part of computer-aided drug design. However, generalisable and performant global binding affinity models remain elusive, particularly in low data regimes. Despite the evolution of model architectures, current benchmarks are not well-suited to probe the generalisability of 3D binding affinity models. Furthermore, 3D global architectures such as GNNs have not lived up to performance expectations. To investigate these issues, we introduce a novel split of the PDBBind dataset, minimizing similarity leakage between train and test sets and allowing for a fair and direct comparison between various model architectures. On this low similarity split, we demonstrate that, in general, 3D global models are superior to protein-specific local models in low data regimes. We also demonstrate that the performance of GNNs benefits from three novel contributions: supervised pre-training via quantum mechanical data, unsupervised pre-training via small molecule diffusion, and explicitly modeling hydrogen atoms in the input graph. We believe that this work introduces promising new approaches to unlock the potential of GNN architectures for binding affinity modelling.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the issue of model generalization in protein-ligand binding affinity prediction, especially in scenarios with limited data. Specifically: 1. **Model Generalization**: Current binding affinity models struggle to achieve good generalization performance under low data conditions. The paper attempts to evaluate the generalization ability of different model architectures by introducing a new data set partitioning method. 2. **Model Architecture Comparison**: The study investigates the performance differences between global 3D models (such as Graph Neural Networks, GNN) and specific local protein models. The results show that under low data conditions, global 3D models outperform local models. 3. **Novel Pre-training Strategies**: Two novel pre-training methods are proposed—quantum mechanics supervised pre-training and small molecule diffusion unsupervised pre-training—to enhance the performance of GNN models. These methods demonstrate significant advantages under low data conditions. 4. **Role of Hydrogen Atoms**: The impact of explicitly including hydrogen atoms in the input graph on model performance is explored. The study finds that explicitly including hydrogen atoms is crucial for improving model generalization ability under low data conditions. In summary, the paper focuses on improving the generalization ability and performance of binding affinity prediction models under low data conditions and proposes a series of new methods and techniques to achieve this goal.

Improving generalisability of 3D binding affinity models in low data regimes

Binding Affinity Prediction with 3D Machine Learning: Training Data and Challenging External Testing

On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction

GEMS: A Generalizable GNN Framework For Protein-Ligand Binding Affinity Prediction Through Robust Data Filtering and Language Model Integration

Improving the generalizability of protein-ligand binding predictions with AI-Bind

Exploring protein–ligand binding affinity prediction with electron density-based geometric deep learning

General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design

Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network

ProAffinity-GNN: A Novel Approach to Structure-based Protein-Protein Binding Affinity Prediction via a Curated Dataset and Graph Neural Networks

GAABind: a Geometry-Aware Attention-Based Network for Accurate Protein-Ligand Binding Pose and Binding Affinity Prediction

SS-GNN: A Simple-Structured Graph Neural Network for Affinity Prediction

BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening

Improved Protein–Ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference

Embracing assay heterogeneity with neural processes for markedly improved bioactivity predictions

Protein-ligand Binding Affinity Prediction Model Based on Graph Attention Network

Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction

Predicting Protein-Ligand Binding Affinity via Joint Global-Local Interaction Modeling

Encoding Protein-Ligand Interactions: Binding Affinity Prediction with Multigraph-based Modeling and Graph Convolutional Network

Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Predicting binding poses and affinities for protein - ligand complexes in the 2015 D3R Grand Challenge using a physical model with a statistical parameter estimation

DeepBindGCN: Integrating Molecular Vector Representation with Graph Convolutional Neural Networks for Protein–Ligand Interaction Prediction