Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Songtao Liu,Hanjun Dai,Yue Zhao,Peng Liu
2024-06-04
Abstract:Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecule set without any lookahead. Furthermore, existing strategies cannot control the generation of synthetic routes based on possible criteria such as material costs, yields, and step count. In this work, we propose a general and principled framework via conditional residual energy-based models (EBMs), that focus on the quality of the entire synthetic route based on the specific criteria. By incorporating an additional energy-based function into our probabilistic model, our proposed algorithm can enhance the quality of the most probable synthetic routes (with higher probabilities) generated by various strategies in a plug-and-play fashion. Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2.5%. Code is available at <a class="link-external link-https" href="https://github.com/SongtaoLiu0823/CREBM" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Biomolecules
What problem does this paper attempt to address?
This paper focuses on the molecular synthesis problem in drug discovery, particularly the multi-step retro-synthesis planning using machine learning. Current methods adopt one-step retro-synthesis models and search algorithms to predict synthesis routes from top to bottom, but they have limitations as they lack forward-looking considerations when selecting the next set of molecules and cannot control the generation of synthesis routes based on possible criteria such as cost, yield, and number of steps. The paper proposes a new framework called Conditional Residual Energy-based Models (CREBM), which focuses on evaluating the quality of the entire synthesis route based on specific criteria. By adding additional energy functions in the probabilistic model, the proposed algorithm can enhance the quality of the most likely synthesis route generated by various strategies in a plug-and-play manner. Experiments show that the framework can generally improve the performance of different strategies and achieve a 2.5% improvement in top-level accuracy compared to the previous state-of-the-art results. Existing strategies often overlook critical factors such as material cost, number of steps, and feasibility in the evaluation, which are vital in practical retro-synthesis planning. The paper points out that existing evaluation metrics mainly focus on the proportion of finding the shortest path but do not check whether the predicted starting materials can actually undergo the required reactions to synthesize the target molecule. To overcome these issues, the paper proposes new evaluation criteria and methods. The paper also discusses the problem of local normalization, which can lead to inconsistent behaviors of the model during training and testing, and fail to fully consider the long-term impact of synthesis routes. By introducing energy functions, the paper's model can guide the generation of synthesis routes based on multiple criteria, achieving controllable synthesis route generation. In conclusion, the paper aims to address the neglect and optimization issues of specific criteria in the generation of molecular synthesis routes by using a Conditional Residual Energy-based Model to improve the quality of generated routes, making them more aligned with practical chemical manufacturing needs.