G 2 Retro as a two-step graph generative models for retrosynthesis prediction

Ziqi Chen,Oluwatosin R. Ayinde,James R. Fuchs,Huan Sun,Xia Ning
DOI: https://doi.org/10.1038/s42004-023-00897-3
IF: 7.211
2023-05-31
Communications Chemistry
Abstract:Retrosynthesis is a procedure where a target molecule is transformed into potential reactants and thus the synthesis routes can be identified. Recently, computational approaches have been developed to accelerate the design of synthesis routes. In this paper,we develop a generative framework G 2 Retro for one-step retrosynthesis prediction. G 2 Retro imitates the reversed logic of synthetic reactions. It first predicts the reaction centers in the target molecules (products), identifies the synthons needed to assemble the products, and transforms these synthons into reactants. G 2 Retro defines a comprehensive set of reaction center types, and learns from the molecular graphs of the products to predict potential reaction centers. To complete synthons into reactants, G 2 Retro considers all the involved synthon structures and the product structures to identify the optimal completion paths, and accordingly attaches small substructures sequentially to the synthons. Here we show that G 2 Retro is able to better predict the reactants for given products in the benchmark dataset than the state-of-the-art methods.
chemistry, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently predict the one - step retrosynthesis of target molecules in the drug discovery process. Specifically, the researchers developed a generation framework G2Retro to predict the possible direct reactants of a given target molecule. This problem is very important in medicinal chemistry because being able to quickly and accurately identify feasible synthetic routes is crucial for the experimental synthesis of drug - like molecules. Traditional retrosynthetic planning mainly relies on the knowledge and experience of synthetic chemists and medicinal chemists, but this method has limitations such as strong subjectivity and difficulty in updating the latest chemical reaction knowledge. Therefore, predicting synthetic reactions through data - driven methods can be a powerful supplement to chemists' evaluations, providing a large number of potential reactions for chemists to consider, thereby accelerating the new drug development process. The main contribution of G2Retro lies in that it mimics the reverse logic of synthetic reactions. First, it predicts the reaction centers in the target molecule, then determines the required synthons for assembling the final product, and finally converts these synthons into reactants. This process not only improves the prediction accuracy but also enhances the interpretability of the model, enabling researchers to better understand which reaction centers are predicted and how these reaction centers gradually generate reactants. In this way, G2Retro outperforms the existing state - of - the - art methods on the benchmark dataset, demonstrating its strong potential in accelerating and promoting retrosynthetic analysis.