Learning A Hierarchical Graph Autoregression Model for Semi-template Molecular Retrosynthesis

Shen Yuan,Fanmeng Wang,Zhewei Wei,Peilin Zhao,Lanqing Li,Hongteng Xu
DOI: https://doi.org/10.26434/chemrxiv-2024-gqp7b
2024-12-03
Abstract:As a significant task of pharmaceutical and chemical engineering, molecular retrosynthesis aims at predicting candidate reactants from predefined products. Treating this challenging task as a conditional generative modeling problem, we propose a hierarchical graph autoregression (HGAR) model and its pretraining-assisted multi-task learning paradigm, leading to an effective semi-template molecular retrosynthesis method. Given a product, we first construct a hierarchical graph by connecting the junction tree of its motifs to the atom-level molecular graph. Our HGAR model embeds the hierarchical graph in the motif and atom levels, respectively. The atom-level embeddings are applied to predict reaction centers and derive synthons from the product. The motif-level embeddings are applied to predict motifs and complete the corresponding synthons autoregressive, leading to the target reactants. We first pretrain the model on PCQM4M-LSC and then fine-tune it on the USPTO retrosynthesis datasets, leading to a model with good generalization power. Experiments show that our HGAR outperforms many representative molecular retrosynthesis methods, especially those semi-template ones, indicating its feasibility and effectiveness in practice.
Chemistry
What problem does this paper attempt to address?