The impact of Library Size and Scale of Testing on Virtual Screening

Fangyu Liu,Olivier Mailhot,Isabella S. Glenn,Seth F. Vigneron,Violla Bassim,Xinyu Xu,Karla Fonseca Valencia,Matthew S. Smith,Dmytro S. Radchenko,James S. Fraser,Yurii S. Moroz,John J. Irwin,Brian K. Shoichet

DOI: https://doi.org/10.1101/2024.07.08.602536

2024-07-11

Abstract:Virtual libraries for ligand discovery have recently increased 10,000-fold, and this is thought to have improved hit rates and potencies from library docking. This idea has not, however, been experimentally tested in direct comparisons of larger-vs-smaller libraries. Meanwhile, though libraries have exploded, the scale of experimental testing has little changed, with often only dozens of high-ranked molecules investigated, making interpretation of hit rates and affinities uncertain. Accordingly, we docked a 1.7 billion molecule virtual library against the model enzyme AmpC β-lactamase, testing 1,521 new molecules and comparing the results to the same screen with a library of 99 million molecules, where only 44 molecules were tested. Encouragingly, the larger screen outperformed the smaller one: hit rates improved by two-fold, more new scaffolds were discovered, and potency improved. Overall, 50-fold more inhibitors were found, supporting the idea that there are many more compounds to be discovered than are being tested. With so many compounds evaluated, we could ask how the results vary with number tested, sampling smaller sets at random from the 1521. Hit rates and affinities were highly variable when we only sampled dozens of molecules, and it was only when we included several hundred molecules that results converged. As docking scores improved, so too did the likelihood of a molecule binding; hit rates improved steadily with docking score, as did affinities. This also appeared true on re-analysis of large-scale results against the σ2 and dopamine D4 receptors. It may be that as the scale of both the virtual libraries and their testing grows, not only are better ligands found but so too does our ability to rank them.

Biophysics

What problem does this paper attempt to address?

This paper discusses the impact of library size and experimental testing scale on the results of virtual screening. By comparing the results of docking the enzyme AmpC β-lactamase with a virtual library of 1.7 billion molecules and a library of 99 million molecules, the researchers found that a larger library can improve hit rates, discover new chemical scaffolds, and enhance activity. In large-scale experimental testing, they synthesized and tested 1521 molecules, and found that the number of inhibitors linearly increased with the increase in the number of tested molecules. The study also revealed that hit rates and affinity have high variability when testing only dozens of molecules, but tend to stabilize when the number of tested molecules reaches hundreds. Furthermore, the paper points out that with the expansion of virtual libraries and experimental testing scale, not only better ligands are found, but also the accuracy of ranking predictions is improved. These findings support the continued expansion of virtual libraries to trillions of molecules and the reevaluation of docking scoring functions to optimize the relationship between ranking and affinity.

The impact of Library Size and Scale of Testing on Virtual Screening

Efficient Exploration of Chemical Space with Docking and Deep Learning

Virtual Screening of a Chemically Diverse "Superscaffold" Library Enables Ligand Discovery for a Key GPCR Target

Correlation of protein binding pocket properties with hits' chemistries used in generation of ultra-large virtual libraries

Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking

Identifying Artifacts from Large Library Docking

Virtual Screening Methods As Tools For Drug Lead Discovery From Large Chemical Libraries

Virtual Screening Methods As Tools for Drug Lead Discovery from Large Chemical Libraries.

An open-source drug discovery platform enables ultra-large virtual screens

Structure-Based Virtual Screening of Chemical Libraries for Drug Discovery

Virtual Ligand Screening against Escherichia coli Dihydrofolate Reductase: Improving Docking Enrichment Using Physics-Based Methods

Fine tuning for success in structure-based virtual screening

A Mechanism to Open Academic Chemistry to High-Throughput Virtual Screening

A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor

Synthon-based ligand discovery in virtual libraries of over 11 billion compounds

Development of Ligand-based Big Data Deep Neural Network Models for Virtual Screening of Large Compound Libraries.

The Pan-Canadian Chemical Library: A Mechanism to Open Academic Chemistry to High-Throughput Virtual Screening

Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

A Comprehensive Survey of Prospective Structure-Based Virtual Screening for Early Drug Discovery in the Past Fifteen Years

Small-Molecule Library Subset Screening as an Aid for Accelerating Lead Identification

Abstract Wrk2-04: Virtual screening of ultra-large chemical spaces for novel chemotype discovery