Abstract:Chemical reaction prediction, encompassing forward synthesis and retrosynthesis, stands as a fundamental challenge in organic synthesis. A widely adopted computational approach frames synthesis prediction as a sequence-to-sequence translation task, using the common SMILES representation for molecules. Current evaluation of machine learning methods for retrosynthesis assume perfect training data, overlooking imperfections in reaction equations in popular datasets, such as missing reactants, products, other physical and practical constraints such as temperature and cost, primarily driven by a focus on the target molecule. This limitation leads to an incomplete representation of viable synthetic routes, especially when multiple sets of reactants can yield a given desired product. In response to these shortcomings, this study examines the prevailing evaluation methods and introduces comprehensive metrics designed to address imperfections in the dataset. Our novel metrics not only assess absolute accuracy by comparing predicted outputs with ground truth but also introduce a nuanced evaluation approach. We provide scores for partial correctness and compute adjusted accuracy through graph matching, acknowledging the inherent complexities of retrosynthetic pathways. Additionally, we explore the impact of small molecular augmentations while preserving chemical properties and employ similarity matching to enhance the assessment of prediction quality. We introduce SynFormer, a sequence-to-sequence model tailored for SMILES representation. It incorporates architectural enhancements to the original transformer, effectively tackling the challenges of chemical reaction prediction. SynFormer achieves a top-1 accuracy of 53.2% on the USPTO-50k dataset, demonstrating an improvement over previous state-of-the-art language models while being more efficient and eliminating the need for pre-training.

CTsynther: Contrastive Transformer model for end-to-end retrosynthesis prediction

Retroformer: Pushing the Limits of Interpretable End-to-end Retrosynthesis Transformer

RetroPrime: A Chemistry-Inspired and Transformer-based Method for Retrosynthesis Predictions

RetroPrime: A Diverse, Plausible and Transformer-based Method for Single-Step Retrosynthesis Predictions

Predicting Retrosynthetic Reaction using Self-Corrected Transformer Neural Networks

RetroCaptioner: beyond attention in end-to-end retrosynthesis transformer via contrastively captioned learnable graph representation

State-of-the-Art Augmented NLP Transformer models for direct and single-step retrosynthesis

Dissecting Errors in Machine Learning for Retrosynthesis: A Granular Metric Framework and Transformer-Based Model for More Informative Predictions

Adapting Language Models for Retrosynthesis Prediction

Deep Retrosynthetic Reaction Prediction using Local Reactivity and Global Attention

Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction

Learning Graph Models for Retrosynthesis Prediction

Molecular Graph Enhanced Transformer for Retrosynthesis Prediction

RetCL: A Selection-based Approach for Retrosynthesis via Contrastive Learning

Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks

Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing

RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction

Retrosynthetic reaction prediction using neural sequence-to-sequence models

RetroXpert: Decompose Retrosynthesis Prediction like a Chemist

Retrosynthesis Prediction with an Iterative String Editing Model