Path Toward High-Throughput Synthesis Planning via Performance Benchmarking

Gergely Zahoránszky-Kőhalmi,Alexander G. Godfrey,Samuel G. Michael,Thierry Masquelin, Andrew Girvin,Hugo Hernandez,Jeyaraman Soundararajan,Nathan Miller,Brett Yang,Eduardo Luiggi Lopez,Jennifer King,Dhatri V. L. Penna,Sridhar Vuyyuru,Ilia Vorontcov,Amin Mannaa, Maya Choudhury,Hailey Fox, Mihir Bafna,Brandon Walker
DOI: https://doi.org/10.26434/chemrxiv-2024-pmjn8
2024-08-12
Abstract:Rapid generation and evaluation of diverse synthesis pathways play a critical role in exploring a broader chemical space and identifying potent drug candidates. Drug discovery often relies on laborintensive manual processes for retro synthetic route finding, resulting in challenges related to scalability and reproducibility. Autonomous chemical synthesis platforms, like ASPIRE aim to address this bottleneck by the development of high-throughput synthesis capabilities. While AI/ML-based predictive methods exist that can generate synthesis routes rapidly, evidence based synthesis route search, often relying on knowledge graphs, poses its own challenges for scalability. In this study, we present a comprehensive benchmarking framework and analysis employed on the ASPIRE Integrated Computational Platform (AICP), that led to a breakthrough in the light of high-throughput synthesis planning. Our strategy encompasses query optimization and domain-driven data engineering techniques, which worked in accord to reduce the synthesis route finding time by orders of magnitude. As a result, AICP is equipped with a high-throughput, evidence-based computer assisted synthesis planning method that has the ability to automatically identify viable synthesis routes to 2000 target molecules within approximately 40 minutes. Complementing existing retrosynthetic approaches, with the use of knowledge graph of 1.2M chemical reactions, AICP represents a significant advancement towards automating high-throughput synthesis in drug discovery, thus paving the way for more efficient drug candidate identification and development.
Chemistry
What problem does this paper attempt to address?