A new paradigm for applying deep learning to protein–ligand interaction prediction

Zechen Wang,Sheng Wang,Yangyang Li,Jingjing Guo,Yanjie Wei,Yuguang Mu,Liangzhen Zheng,Weifeng Li
DOI: https://doi.org/10.1093/bib/bbae145
IF: 9.5
2024-04-08
Briefings in Bioinformatics
Abstract:Protein–ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein–ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein–ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein–ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the insufficient accuracy in predicting protein - ligand interactions in drug design and the lack of practical physical meaning in current model scoring systems. Specifically, the author proposes a new method based on geometric graph neural networks (IGModel) for predicting the root - mean - square deviation (RMSD) and binding strength (pKd) of protein - ligand complexes. This method aims to improve the accuracy of prediction and ensure that the output scores have an intuitive physical meaning, thus providing more reliable support in drug design. ### Main Problems 1. **Accuracy of Protein - Ligand Interaction Prediction**: Current machine - learning and deep - learning models often lack sufficient accuracy when predicting protein - ligand interactions, especially in predicting the binding pose of ligands. 2. **Physical Meaning of Scoring Systems**: Existing scoring systems lack practical physical meaning when predicting the binding strength of protein - ligand, which makes it difficult for researchers to understand and interpret the output results of the model. ### Solutions The author proposes IGModel, a new method based on geometric graph neural networks, with the following main features: - **Input Data**: IGModel uses the graph representation of protein - binding pockets and protein - ligand atomic interactions as input. - **Geometric Features**: In addition to considering the distances between atoms, new directional features such as dihedral angles and angles are introduced to more comprehensively describe the relative position of the ligand in the binding pocket. - **Model Structure**: IGModel contains two branches, which respectively process the binding - pocket graph and the protein - ligand interaction graph. Each branch contains two EdgeGAT layers for encoding the information in the graph. - **Decoding Module**: RMSD and pKd are predicted respectively through two decoding modules. To ensure that the predicted pKd value decreases as RMSD increases, a decay factor W is also introduced into the model. ### Experimental Results - **CASF - 2016 Benchmark Test**: IGModel performs excellently on multiple evaluation metrics, especially in identifying near - native conformations (docking power), achieving the highest Top1 success rate (97.5%). - **Cross - Docking Test Set**: On datasets such as PDBbind - CrossDocked - Core and DISCO, IGModel also shows excellent performance, demonstrating its generalization ability and practicality. - **Virtual Screening Ability**: Although slightly inferior to the state - of - the - art models in some virtual screening tasks, IGModel still shows relatively balanced performance. ### Conclusion IGModel provides a new framework for deep - learning prediction of protein - ligand interactions, especially binding - pose prediction. This method not only improves the accuracy of prediction but also ensures that the output scores have an intuitive physical meaning, providing strong support for drug design.