PIGNet: A physics-informed deep learning model toward generalized drug-target interaction predictions

Seokhyun Moon,Wonho Zhung,Soojung Yang,Jaechang Lim,Woo Youn Kim
DOI: https://doi.org/10.48550/arXiv.2008.12249
2021-12-13
Abstract:Recently, deep neural network (DNN)-based drug-target interaction (DTI) models were highlighted for their high accuracy with affordable computational costs. Yet, the models' insufficient generalization remains a challenging problem in the practice of in-silico drug discovery. We propose two key strategies to enhance generalization in the DTI model. The first is to predict the atom-atom pairwise interactions via physics-informed equations parameterized with neural networks and provides the total binding affinity of a protein-ligand complex as their sum. We further improved the model generalization by augmenting a broader range of binding poses and ligands to training data. We validated our model, PIGNet, in the comparative assessment of scoring functions (CASF) 2016, demonstrating the outperforming docking and screening powers than previous methods. Our physics-informing strategy also enables the interpretation of predicted affinities by visualizing the contribution of ligand substructures, providing insights for further ligand optimization.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in drug - target interaction (DTI) prediction, although the existing deep - learning models have high accuracy and low computational cost, they lack generalization ability. Specifically, the existing models may not be well generalized to unseen data in virtual drug screening due to scarce and unbalanced data, resulting in poor performance in practical applications. To improve this situation, the author proposes two key strategies to enhance the generalization ability of DTI models: 1. **Physics - Informed Graph Neural Network (PIGNet)**: - The neural network parameterized by physical - information equations predicts the interactions between atom pairs, and the sum of these interactions is taken as the binding affinity of the protein - ligand complex. - This method allows the model to decompose unseen protein - ligand pairs into common interaction combinations, thereby improving the model's generalization ability. - The model can also analyze the contribution of each molecular sub - structure to the binding affinity, providing guidance for drug optimization. 2. **Data augmentation strategy**: - By adding a wider range of binding poses and ligands to the training data to improve the model's generalization ability. - Specifically, the method is to use computationally generated random binding poses to expand the training data, enabling the model to better distinguish between stable and non - stable binding poses. Through these two strategies, the author hopes that their model (PIGNet) will show better docking and screening capabilities than existing methods in the CASF - 2016 benchmark test, and thus have higher reliability and efficiency in practical application scenarios such as virtual high - throughput screening (vHTS).