Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design

Wenhao Gao,Rocío Mercado,Connor W. Coley
DOI: https://doi.org/10.48550/arXiv.2110.06389
2022-03-13
Abstract:Molecular design and synthesis planning are two critical steps in the process of molecular discovery that we propose to formulate as a single shared task of conditional synthetic pathway generation. We report an amortized approach to generate synthetic pathways as a Markov decision process conditioned on a target molecular embedding. This approach allows us to conduct synthesis planning in a bottom-up manner and design synthesizable molecules by decoding from optimized conditional codes, demonstrating the potential to solve both problems of design and synthesis simultaneously. The approach leverages neural networks to probabilistically model the synthetic trees, one reaction step at a time, according to reactivity rules encoded in a discrete action space of reaction templates. We train these networks on hundreds of thousands of artificial pathways generated from a pool of purchasable compounds and a list of expert-curated templates. We validate our method with (a) the recovery of molecules using conditional generation, (b) the identification of synthesizable structural analogs, and (c) the optimization of molecular structures given oracle functions relevant to drug discovery.
Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two crucial steps in the molecular discovery process: molecular design and synthesis planning. Specifically, the authors propose a new method that integrates molecular design and synthesis planning into a shared task - conditional synthetic pathway generation. Through this method, the problems of molecular design and synthesis planning can be solved simultaneously. #### Main problems 1. **Molecular design**: - Design new molecules with specific properties. - Ensure that these molecules are synthesizable, that is, they can be prepared from purchasable starting materials through known chemical reactions. 2. **Synthesis planning**: - Plan a chemical synthesis path from purchasable starting materials to the target molecule. - Ensure that this path is practical and conforms to the rules of chemical reactions. #### Method innovation - **Jointly solve design and synthesis**: Traditional molecular design and synthesis planning are usually two independent processes, while the method proposed in this paper can plan the synthesis path while generating the molecule, thereby improving efficiency and ensuring the synthesizability of the molecule. - **Generation model based on Markov decision process (MDP)**: The authors use neural networks to model the generation process of the synthesis tree, regarding it as a Markov decision process. Each reaction step is selected according to pre - defined reaction templates, ensuring that the generated path is chemically reasonable. - **Conditional generation**: By giving the embedding of the target molecule, the model can generate the corresponding synthesis tree according to this condition. This enables the model to be used for retrosynthesis planning, that is, to deduce the synthesis path from the target molecule in reverse. - **Optimize molecular structure**: By optimizing the molecular embedding, the model can also explore the molecular space with specific properties and find the optimal synthesizable molecular structure. #### Experimental verification The authors verified the effectiveness of the method through the following experiments: - **Molecule reconstruction**: Test whether the model can successfully reconstruct known molecules. - **Synthesizable analogue recommendation**: For target molecules that cannot be directly synthesized, whether the model can recommend similar synthesizable molecules. - **Multi - objective optimization**: Evaluate the performance of the model in generating molecules under multiple objective functions related to drug design. In conclusion, this paper solves the key challenges in molecular design and synthesis planning by proposing a novel conditional synthetic pathway generation method, providing a powerful tool for automated molecular discovery.