GSScore: a novel Graphormer-based shell-like scoring method for protein–ligand docking

Linyuan Guo,Jianxin Wang
DOI: https://doi.org/10.1093/bib/bbae201
IF: 9.5
2024-05-08
Briefings in Bioinformatics
Abstract:Protein–ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein–ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein–ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein–ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, | | (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue of recognizing near-native conformations in protein–ligand docking. Specifically, the authors propose a new scoring method called GSScore, which is based on the Graphormer method and a shell-like graph architecture, to predict the Root Mean Square Deviation (RMSD) of protein–ligand docking poses. Through this method, GSScore can model the protein–ligand docking interface as multiple bipartite graphs within different shell ranges, effectively capturing the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses. ### Main Contributions 1. **Proposing GSScore**: Combining the Graphormer method and shell-like graph architecture, it can identify near-native conformations from a large number of docking poses. 2. **Feature Extraction**: Extracting complex nonlinear combination features of the protein–ligand interface through a multi-layer shell structure, improving scoring accuracy. 3. **Performance Evaluation**: Extensive evaluation on various datasets, including PDBBind 2019, CASF2016, and DUD-E, showing significantly better performance than existing methods. ### Datasets and Methods - **Datasets**: Mainly using 17,679 protein–ligand complexes from the PDBBind 2019 database and their corresponding RMSD values for training and testing. - **Experimental Design**: Cross-validation using multiple datasets and analyzing distribution consistency between different datasets through Jensen-Shannon divergence to ensure good generalization ability of the model. - **Node and Edge Features**: Defining category features of protein atoms and ligand atoms, as well as different types of edge features, to capture interaction patterns between molecules. - **Network Architecture**: Based on the Graphormer model, introducing spatial encoding and edge encoding features to enhance the model's ability to capture local structures and global information. Through the above methods, GSScore performs excellently in recognizing near-native conformations, providing a new solution for protein–ligand docking.