The Pan-Canadian Chemical Library: A Mechanism to Open Academic Chemistry to High-Throughput Virtual Screening

Corentin Bedart,Grace Shimokura,Frederick G. West,Tabitha E. Wood,Robert A. Batey,John J. Irwin,Matthieu Schapira
DOI: https://doi.org/10.1038/s41597-024-03443-5
2024-06-07
Scientific Data
Abstract:Computationally screening chemical libraries to discover molecules with desired properties is a common technique used in early-stage drug discovery. Recent progress in the field now enables the efficient exploration of billions of molecules within days or hours, but this exploration remains confined within the boundaries of the accessible chemistry space. While the number of commercially available compounds grows rapidly, it remains a limited subset of all druglike small molecules that could be synthesized. Here, we present a workflow where chemical reactions typically developed in academia and unconventional in drug discovery are exploited to dramatically expand the chemistry space accessible to virtual screening. We use this process to generate a first version of the Pan-Canadian Chemical Library, a collection of nearly 150 billion diverse compounds that does not overlap with other ultra-large libraries such as Enamine REAL or SAVI and could be a resource of choice for protein targets where other libraries have failed to deliver bioactive molecules.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper focuses on the problem of using virtual screening to find molecules with desired properties in drug discovery. With the increase in computational power and the expansion of chemical libraries, it is now possible to efficiently explore billions of molecules, but this exploration is still limited to the accessible chemical space. Despite the increasing number of commercially available compounds, this represents only a small fraction of all possible drug-like small molecules. The paper proposes a workflow that utilizes chemical reactions typically developed in academic laboratories but not commonly used in drug discovery to greatly expand the chemical space available for virtual screening. Through this approach, the authors generated the first collection called the "Pan-Canadian Chemical Library" (PCCL), which consists of nearly 150 billion distinct compounds that do not overlap with existing large libraries like Enamine REAL or SAVI, and may serve as a resource for protein targets that other libraries have failed to provide bioactive molecules for. The PCCL is composed of chemical reactions from academic laboratories at the University of Toronto, the University of Manitoba, and the University of Alberta, combined with compatible reagents from the ZINC database, resulting in up to 148 billion synthesizable compounds, with 401 million being low-cost. These low-cost, drug-like molecules exhibit similar diversity in physical-chemical properties, three-dimensional structures, and chemical scaffolds as commercial catalogs but have almost zero overlap with other existing libraries. This study highlights the potential of leveraging innovative chemistry from the academic community to expand the accessible chemical space in drug discovery and other fields and may serve as a valuable resource for developing pharmacological modulator targets for every human protein by 2035, as part of the Target 2035 initiative aimed at exploring unknown proteomics and revealing new opportunities for precision medicine.