Directly Optimizing for Synthesizability in Generative Molecular Design using Retrosynthesis Models

Jeff Guo,Philippe Schwaller
2024-07-17
Abstract:Synthesizability in generative molecular design remains a pressing challenge. Existing methods to assess synthesizability span heuristics-based methods, retrosynthesis models, and synthesizability-constrained molecular generation. The latter has become increasingly prevalent and proceeds by defining a set of permitted actions a model can take when generating molecules, such that all generations are anchored in "synthetically-feasible" chemical transformations. To date, retrosynthesis models have been mostly used as a post-hoc filtering tool as their inference cost remains prohibitive to use directly in an optimization loop. In this work, we show that with a sufficiently sample-efficient generative model, it is straightforward to directly optimize for synthesizability using retrosynthesis models in goal-directed generation. Under a heavily-constrained computational budget, our model can generate molecules satisfying a multi-parameter drug discovery optimization task while being synthesizable, as deemed by the retrosynthesis model.
Biomolecules
What problem does this paper attempt to address?
This paper attempts to address the challenge of synthesizability in generative molecular design. Specifically, although existing generative models can propose molecules with good properties, whether these molecules can be actually synthesized remains an urgent problem to be solved. Many molecules proposed by generative models have difficulties in finding feasible synthesis routes, which limits their practical application value. To solve this problem, the authors propose a new method, that is, directly optimizing the synthesizability of molecules in the goal - directed generation process. They use retrosynthesis models as evaluation tools and show how to directly optimize the synthesizability of molecules through a highly - sampled generative model under a strict computational budget. This method not only improves the practical synthesizability of generated molecules but also can achieve good performance in multi - parameter drug discovery optimization tasks. ### Summary of the core problems in the paper: 1. **Synthesizability evaluation**: Existing methods include heuristic - based evaluation, retrosynthesis models, and molecule generation with synthesis constraints. However, due to the high inference cost, retrosynthesis models are usually only used for post - hoc filtering. 2. **Sample efficiency**: In order to optimize the objective function within a limited computational budget, the model needs to have efficient sample utilization. Especially when calculating expensive property predictions (such as binding affinity prediction), sample efficiency is particularly important. 3. **Direct optimization of synthesizability**: The authors propose a method of directly using retrosynthesis models as evaluation tools in goal - directed generation, thereby ensuring that the generated molecules have practical synthesizability. ### Method overview: - Use the Saturn model, which is a language - based molecule - generation model with high sample efficiency. - Integrate AiZynthFinder (a retrosynthesis tool) into Saturn to evaluate the synthesizability of generated molecules. - Define two objective functions: RAll MPO (simultaneously optimize docking score, QED, SA score, and synthesizability) and RDouble MPO (only optimize docking score and synthesizability). - Conduct experiments under a strict computational budget (1,000 oracle calls) and compare with existing methods (such as RGFN). ### Experimental results: - By directly optimizing synthesizability, Saturn can generate molecules with high synthesizability within 1,000 oracle calls. - Compared with RGFN, Saturn shows higher efficiency under the same computational budget, and the generated molecules not only have a good docking score but also meet the requirements of other physical and chemical properties. - Even starting from an unsuitable training distribution, Saturn can still gradually optimize the synthesizability of generated molecules through curriculum learning. In conclusion, this paper proposes an effective method to directly optimize synthesizability in generative molecular design, thereby increasing the practical application value of generated molecules.