Site-Specific Template Generative Approach for Retrosynthetic Planning

Yu Shee,Haote Li,Pengpeng Zhang,Andrea Nikolic,Sanil Sreekumar,Frédéric Buono,Jinhua Song,Timothy Newhouse,Victor Batista
DOI: https://doi.org/10.26434/chemrxiv-2024-zscw8
2024-04-25
Abstract:Retrosynthesis, the strategy of devising laboratory pathways for small molecules by working backwards from the target compound, remains a rate limiting step in multi-step synthesis of complex molecules, particularly in drug discovery. Enhancing retrosynthetic efficacy requires overcoming the vast complexity of chemical space, the limited known interconversions between molecules, and the challenges posed by limited experimental datasets. In this study, we introduce generative machine learning methods for retrosynthetic planning that generate reaction templates. Our approach features three key innovations. First, the models generate complete reactions, known as templates, instead of reactants or synthons. Through this abstraction, novel chemical transforms resembling those in the training dataset can be generated. Second, the approach optionally allows users to select the specific bond or bonds to be changed in the proposed reaction, enabling human interaction to influence the synthetic approach. Third, one of our models, based on the conditional kernel-elastic autoencoder (CKAE) architecture, employs a latent space to measure the similarity between generated and known reactions, providing insights into their chemical viability. Together, these features establish a coherent framework for retrosynthetic planning, as validated by our experimental work. We demonstrate the application of our machine learning methodology to design a synthetic pathway for a simple yet challenging small molecule of pharmaceutical interest. The pathway was experimentally proven viable through a 3-step process, which compares favorably to previous 5-9 step approaches. This improvement demonstrates the utility and robustness of the generative machine learning approaches described herein and highlights their potential to address a broad spectrum of challenges in chemical synthesis.
Chemistry
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in retrosynthesis, especially the bottlenecks encountered in multi - step synthesis of complex molecules during the drug discovery process. Specifically, the paper attempts to solve the following problems: 1. **Complexity of chemical space**: Retrosynthetic planning needs to deal with an extremely complex chemical space, which makes it very difficult to find a suitable synthesis path. 2. **Limitations of known intermolecular conversions**: Existing retrosynthetic methods rely on known intermolecular conversion rules, but these rules are often limited and cannot cover all possible chemical reactions. 3. **Limitations of experimental data sets**: Due to the limitations of experimental data sets, many potential chemical reactions have not been explored or verified, which further increases the difficulty of retrosynthetic planning. To solve these problems, the author introduced a new retrosynthetic planning method based on a generative model, which can generate reaction templates. This method has the following three main innovations: 1. **Generate complete reaction templates**: Unlike traditional methods, this model generates complete reaction templates instead of only generating reactants or synthons. Through this abstract method, new templates similar to chemical transformations in the training data set can be generated. 2. **User - specified changes in specific bonds**: This method allows the user to select specific bonds or bondings to be changed in the proposed reaction, thus realizing the influence of human interaction on the synthesis strategy. 3. **Utilize the Conditional Kernel Auto - Encoder (CKAE) architecture**: One of the models is based on the CKAE architecture, which measures the similarity between the generated reaction and the known reaction through the latent space, providing insights into its chemical feasibility. Finally, these innovations together establish a coherent retrosynthetic planning framework and are verified by experimental work. The paper shows how to apply machine - learning methods to design a synthesis path for a simple but challenging drug - related small molecule, and proves the effectiveness of this path through a three - step experimental process. This improvement indicates that generative machine - learning methods have broad application potential in chemical synthesis and can significantly reduce the number of synthesis steps and improve efficiency.