SynTemp: Efficient Extraction of Graph-Based Reaction Rules from Large-Scale Reaction Databases

Tieu-Long Phan, Klaus Weinbauer,Marcos E. Gonzalez Laffitte,Yingjie Pan,Daniel Merkle,Jakob L. Andersen,Rolf Fagerberg,Peter F. Stadler,Christoph Flamm
DOI: https://doi.org/10.26434/chemrxiv-2024-tkm36
2024-09-30
Abstract:SynTemp is a framework designed to extract and hierarchically cluster reaction templates from large-scale reaction data repositories. Reaction templates are partial Imaginary Transition State graphs representing the reaction center as well as surrounding context. These graphs are equivalent to Double Pushout graph rewriting rules and thus can be applied directly to predict reaction outcomes at structural formula level. Rule inference is based on a consensus of multiple atom-atom mapping (AAM) tools integrating predictions RXNMapper, GraphormerMapper, and LocalMapper based on a robust graph-theoretic methodology for comparing partial atom-atom mappings. SynTemp achieves an exceptional accuracy of 99.5% and a success rate of 71.23% in obtaining AAMs on the Chemical Reaction Dataset. Reaction centers with surrounding contexts are extracted and completed with mechanistically relevant hydrogen atoms to obtain complete reaction templates. Subsequently, they were categorized into distinct groups based on topological features using hierarchical clustering, resulting in a library of 311 transformation rules that explains 86% of the reaction data set. A residual of 14% remained unresolved due to non-equivalent AAMs and ambiguous hydrogen placements. Despite these challenges, the coverage of our templates remains high at approximately 93.5-94.5%, surpassing that of RDChiral using SMARTS templates.
Chemistry
What problem does this paper attempt to address?