Generic protein–ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling
Duanhua Cao,Geng Chen,Jiaxin Jiang,Jie Yu,Runze Zhang,Mingan Chen,Wei Zhang,Lifan Chen,Feisheng Zhong,Yingying Zhang,Chenghao Lu,Xutong Li,Xiaomin Luo,Sulin Zhang,Mingyue Zheng
DOI: https://doi.org/10.1038/s42256-024-00849-z
IF: 23.8
2024-06-07
Nature Machine Intelligence
Abstract:Developing robust methods for evaluating protein–ligand interactions has been a long-standing problem. Data-driven methods may memorize ligand and protein training data rather than learning protein–ligand interactions. Here we show a scoring approach called EquiScore, which utilizes a heterogeneous graph neural network to integrate physical prior knowledge and characterize protein–ligand interactions in equivariant geometric space. EquiScore is trained based on a new dataset constructed with multiple data augmentation strategies and a stringent redundancy-removal scheme. On two large external test sets, EquiScore consistently achieved top-ranking performance compared to 21 other methods. When EquiScore is used alongside different docking methods, it can effectively enhance the screening ability of these docking methods. EquiScore also showed good performance on the activity-ranking task of a series of structural analogues, indicating its potential to guide lead compound optimization. Finally, we investigated different levels of interpretability of EquiScore, which may provide more insights into structure-based drug design.
computer science, artificial intelligence, interdisciplinary applications