RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation

Yiming Wang,Yuxuan Song,Minkai Xu,Rui Wang,Hao Zhou,Weiying Ma
2023-11-24
Abstract:Retrosynthesis poses a fundamental challenge in biopharmaceuticals, aiming to aid chemists in finding appropriate reactant molecules and synthetic pathways given determined product molecules. With the reactant and product represented as 2D graphs, retrosynthesis constitutes a conditional graph-to-graph generative task. Inspired by the recent advancements in discrete diffusion models for graph generation, we introduce Retrosynthesis Diffusion (RetroDiff), a novel diffusion-based method designed to address this problem. However, integrating a diffusion-based graph-to-graph framework while retaining essential chemical reaction template information presents a notable challenge. Our key innovation is to develop a multi-stage diffusion process. In this method, we decompose the retrosynthesis procedure to first sample external groups from the dummy distribution given products and then generate the external bonds to connect the products and generated groups. Interestingly, such a generation process is exactly the reverse of the widely adapted semi-template retrosynthesis procedure, i.e. from reaction center identification to synthon completion, which significantly reduces the error accumulation. Experimental results on the benchmark have demonstrated the superiority of our method over all other semi-template methods.
Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The paper aims to address the problem of retrosynthesis in the field of organic synthesis, with the specific goal of assisting chemists in finding suitable reactant molecules and synthesis pathways for a given target product molecule. Retrosynthesis is considered a fundamental challenge in the biopharmaceutical field. To tackle this challenge, the paper proposes a novel method called the Retrosynthesis Diffusion Model (RetroDiff), which is a conditional graph-to-graph generation model based on the diffusion process. Traditional methods are often categorized into three types: template-based, template-free, and semi-template-based. RetroDiff is a new variant of the semi-template-based approach. The key innovation of RetroDiff lies in the design of a multi-stage diffusion process, where the retrosynthesis process is decomposed into two parts: first, sampling external groups from the product molecule, and then generating external bonds that connect the product to these external groups. This method reverses the traditional semi-template retrosynthesis process (from reaction center identification to synthesis fragment completion), thereby significantly reducing error accumulation. Experimental results show that on the USPTO-50k dataset, RetroDiff outperforms all other semi-template methods in benchmark tests and significantly improves the accuracy of reaction center prediction, thanks to the acquisition of chemical information about external groups. In summary, the goal of this paper is to develop a more efficient semi-template retrosynthesis method by introducing the RetroDiff model to achieve this objective.