PIGNet2: A Versatile Deep Learning-based Protein-Ligand Interaction Prediction Model for Binding Affinity Scoring and Virtual Screening

Seokhyun Moon,Sang-Yeon Hwang,Jaechang Lim,Woo Youn Kim
2023-07-17
Abstract:Prediction of protein-ligand interactions (PLI) plays a crucial role in drug discovery as it guides the identification and optimization of molecules that effectively bind to target proteins. Despite remarkable advances in deep learning-based PLI prediction, the development of a versatile model capable of accurately scoring binding affinity and conducting efficient virtual screening remains a challenge. The main obstacle in achieving this lies in the scarcity of experimental structure-affinity data, which limits the generalization ability of existing models. Here, we propose a viable solution to address this challenge by introducing a novel data augmentation strategy combined with a physics-informed graph neural network. The model showed significant improvements in both scoring and screening, outperforming task-specific deep learning models in various tests including derivative benchmarks, and notably achieving results comparable to the state-of-the-art performance based on distance likelihood learning. This demonstrates the potential of this approach to drug discovery.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address a critical issue in protein-ligand interaction (PLI) prediction, particularly in the context of drug discovery. Specifically, despite significant advances in PLI prediction using deep learning-based methods, developing a versatile model that can accurately score binding affinity and efficiently perform virtual screening remains a challenge. The main obstacle is the scarcity of experimental structure-affinity data, which limits the generalization ability of existing models. To tackle this challenge, the paper proposes a novel approach that combines a data augmentation strategy with a physics-informed graph neural network. This method enhances the model's capability to handle different tasks by generating data that closely resembles natural structures. The results indicate that the model shows significant improvements in both scoring and screening tasks, even outperforming task-specific deep learning models in some benchmarks, and its performance is comparable to the latest methods based on distance likelihood learning. This demonstrates the potential application value of the method in drug discovery. In summary, the main goal of the paper is to develop a versatile PLI prediction model that excels in both scoring and virtual screening tasks, overcoming the limitations of existing models in terms of generalization ability.