The impact of Library Size and Scale of Testing on Virtual Screening

Fangyu Liu,Olivier Mailhot,Isabella S. Glenn,Seth F. Vigneron,Violla Bassim,Xinyu Xu,Karla Fonseca Valencia,Matthew S. Smith,Dmytro S. Radchenko,James S. Fraser,Yurii S. Moroz,John J. Irwin,Brian K. Shoichet
DOI: https://doi.org/10.1101/2024.07.08.602536
2024-07-11
Abstract:Virtual libraries for ligand discovery have recently increased 10,000-fold, and this is thought to have improved hit rates and potencies from library docking. This idea has not, however, been experimentally tested in direct comparisons of larger-vs-smaller libraries. Meanwhile, though libraries have exploded, the scale of experimental testing has little changed, with often only dozens of high-ranked molecules investigated, making interpretation of hit rates and affinities uncertain. Accordingly, we docked a 1.7 billion molecule virtual library against the model enzyme AmpC β-lactamase, testing 1,521 new molecules and comparing the results to the same screen with a library of 99 million molecules, where only 44 molecules were tested. Encouragingly, the larger screen outperformed the smaller one: hit rates improved by two-fold, more new scaffolds were discovered, and potency improved. Overall, 50-fold more inhibitors were found, supporting the idea that there are many more compounds to be discovered than are being tested. With so many compounds evaluated, we could ask how the results vary with number tested, sampling smaller sets at random from the 1521. Hit rates and affinities were highly variable when we only sampled dozens of molecules, and it was only when we included several hundred molecules that results converged. As docking scores improved, so too did the likelihood of a molecule binding; hit rates improved steadily with docking score, as did affinities. This also appeared true on re-analysis of large-scale results against the σ2 and dopamine D4 receptors. It may be that as the scale of both the virtual libraries and their testing grows, not only are better ligands found but so too does our ability to rank them.
Biophysics
What problem does this paper attempt to address?
This paper discusses the impact of library size and experimental testing scale on the results of virtual screening. By comparing the results of docking the enzyme AmpC β-lactamase with a virtual library of 1.7 billion molecules and a library of 99 million molecules, the researchers found that a larger library can improve hit rates, discover new chemical scaffolds, and enhance activity. In large-scale experimental testing, they synthesized and tested 1521 molecules, and found that the number of inhibitors linearly increased with the increase in the number of tested molecules. The study also revealed that hit rates and affinity have high variability when testing only dozens of molecules, but tend to stabilize when the number of tested molecules reaches hundreds. Furthermore, the paper points out that with the expansion of virtual libraries and experimental testing scale, not only better ligands are found, but also the accuracy of ranking predictions is improved. These findings support the continued expansion of virtual libraries to trillions of molecules and the reevaluation of docking scoring functions to optimize the relationship between ranking and affinity.