Abstract:Chemical reaction prediction, encompassing forward synthesis and retrosynthesis, stands as a fundamental challenge in organic synthesis. A widely adopted computational approach frames synthesis prediction as a sequence-to-sequence translation task, using the common SMILES representation for molecules. Current evaluation of machine learning methods for retrosynthesis assume perfect training data, overlooking imperfections in reaction equations in popular datasets, such as missing reactants, products, other physical and practical constraints such as temperature and cost, primarily driven by a focus on the target molecule. This limitation leads to an incomplete representation of viable synthetic routes, especially when multiple sets of reactants can yield a given desired product. In response to these shortcomings, this study examines the prevailing evaluation methods and introduces comprehensive metrics designed to address imperfections in the dataset. Our novel metrics not only assess absolute accuracy by comparing predicted outputs with ground truth but also introduce a nuanced evaluation approach. We provide scores for partial correctness and compute adjusted accuracy through graph matching, acknowledging the inherent complexities of retrosynthetic pathways. Additionally, we explore the impact of small molecular augmentations while preserving chemical properties and employ similarity matching to enhance the assessment of prediction quality. We introduce SynFormer, a sequence-to-sequence model tailored for SMILES representation. It incorporates architectural enhancements to the original transformer, effectively tackling the challenges of chemical reaction prediction. SynFormer achieves a top-1 accuracy of 53.2% on the USPTO-50k dataset, demonstrating an improvement over previous state-of-the-art language models while being more efficient and eliminating the need for pre-training.

Adapting Language Models for Retrosynthesis Prediction

Learning Graph Models for Retrosynthesis Prediction

Retrosynthesis Prediction with an Iterative String Editing Model

RetroPrime: A Chemistry-Inspired and Transformer-based Method for Retrosynthesis Predictions

Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks

CTsynther: Contrastive Transformer model for end-to-end retrosynthesis prediction

Retroformer: Pushing the Limits of Interpretable End-to-end Retrosynthesis Transformer

State-of-the-Art Augmented NLP Transformer models for direct and single-step retrosynthesis

Large Language Models for Inorganic Synthesis Predictions

Dissecting Errors in Machine Learning for Retrosynthesis: A Granular Metric Framework and Transformer-Based Model for More Informative Predictions

Leveraging Reaction-aware Substructures for Retrosynthesis Analysis

Deep Retrosynthetic Reaction Prediction using Local Reactivity and Global Attention

RetroPrime: A Diverse, Plausible and Transformer-based Method for Single-Step Retrosynthesis Predictions

Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing

Decomposing Retrosynthesis into Reactive Center Prediction and Molecule Generation

Predicting retrosynthetic pathways using a combined linguistic model and hyper-graph exploration strategy

Improve retrosynthesis planning with a molecular editing language

RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction

Retrosynthetic reaction prediction using neural sequence-to-sequence models

Recent advances in deep learning for retrosynthesis