SQM2.20: Semiempirical quantum-mechanical scoring function yields DFT-quality protein–ligand binding affinity predictions in minutes

Adam Pecina,Jindřich Fanfrlík,Martin Lepšík,Jan Řezáč
DOI: https://doi.org/10.1038/s41467-024-45431-8
IF: 16.6
2024-02-06
Nature Communications
Abstract:Abstract Accurate estimation of protein–ligand binding affinity is the cornerstone of computer-aided drug design. We present a universal physics-based scoring function, named SQM2.20, addressing key terms of binding free energy using semiempirical quantum-mechanical computational methods. SQM2.20 incorporates the latest methodological advances while remaining computationally efficient even for systems with thousands of atoms. To validate it rigorously, we have compiled and made available the PL-REX benchmark dataset consisting of high-resolution crystal structures and reliable experimental affinities for ten diverse protein targets. Comparative assessments demonstrate that SQM2.20 outperforms other scoring methods and reaches a level of accuracy similar to much more expensive DFT calculations. In the PL-REX dataset, it achieves excellent correlation with experimental data (average R 2 = 0.69) and exhibits consistent performance across all targets. In contrast to DFT, SQM2.20 provides affinity predictions in minutes, making it suitable for practical applications in hit identification or lead optimization.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper mainly addresses the problem of accurately predicting protein-ligand binding affinity in computer-aided drug design. Existing methods, from simple scoring functions to advanced methods based on molecular dynamics or complex quantum mechanical calculations, often have their accuracy proportional to the computational cost. The researchers proposed a semi-empirical quantum mechanics scoring function called SQM2.20, which uses the latest computational methods and maintains efficiency even in systems containing thousands of atoms. SQM2.20 does not require fine-tuning for specific targets or protein-ligand interactions, and outperforms other scoring methods in a validation of a range of ligands on ten different protein targets, achieving accuracy comparable to density functional theory (DFT) calculations but taking only a few minutes of computation time. It is suitable for hit identification or lead optimization stages in the drug discovery process in practical applications. The paper also constructs a unique benchmark dataset called PL-REX, which includes high-quality crystal structures and experimental binding affinity data, for rigorous validation of various scoring methods.