Abstract:AI-assisted synthesis planning has emerged as a valuable tool in accelerating synthetic chemistry for the discovery of new drugs and materials. The template-free approach, which showcases superior generalization capabilities, is seen as the mainstream direction in this field. However, it remains unclear whether such an end-to-end approach can achieve problem-solving performance on par with experienced chemists without fully revealing insights into the chemical mechanisms involved. Moreover, there is a lack of unified and chemically inspired frameworks for improving multitask reaction predictions in this area. In this study, we have addressed these challenges by investigating the impact of fine-grained reaction-type labels on multiple downstream tasks and propose a novel framework named SynCluster. This framework incorporates unsupervised clustering cues into the baseline models and identifies plausible chemical subspaces which is compatible with multitask extensions and can serve as model-independent indicators to effectively enhance the performance of multiple downstream tasks. In retrosynthesis prediction, SynCluster achieves significant improvements of 4.1 and 11.0% in top-1 and top-10 prediction accuracy, respectively, compared to the baseline Molecular Transformer, and achieves a notable enhancement of 13.9% in top-10 accuracy when combined with Retroformer. By incorporating simplified molecular-input line-entry system augmentation, our framework achieves higher top-10 accuracy compared to state-of-the-art sequence-based retrosynthesis models and improves over the baseline on the diversity and validity of reactants. SynCluster also achieves 94.9% top-10 accuracy in forward synthesis prediction and 51.5% top-10 Maxfrag accuracy in reagent prediction. Overall, SynCluster provides a fresh perspective with chemical interpretability and reinforcement of domain knowledge in the synthesis design. It offers a promising solution for improving the accuracy and efficiency of AI-assisted synthesis planning and bridges the gap between template-free approaches and the problem-solving abilities of experienced chemists.

SynCoTrain: A Dual Classifier PU-learning Framework for Synthesizability Prediction

Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis.

Semi-supervised teacher-student deep neural network for materials discovery

Predicting Synthesizability using Machine Learning on Databases of Existing Inorganic Materials

Prediction of Organic Reaction Outcomes Using Machine Learning

SynCluster: Reaction Type Clustering and Recommendation Framework for Synthesis Planning

Enhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditions

Explainable Synthesizability Prediction of Inorganic Crystal Polymorphs using Large Language Models

Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning

Machine-Learning Rationalization and Prediction of Solid-State Synthesis Conditions

Predictive Synthesis of Quantum Materials by Probabilistic Reinforcement Learning

Dissecting Errors in Machine Learning for Retrosynthesis: A Granular Metric Framework and Transformer-Based Model for More Informative Predictions

Predicting Miscibility in Binary Compounds: A Machine Learning and Genetic Algorithm Study

Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks

Connecting metal-organic framework synthesis to applications with a self-supervised multimodal model

Prediction of Synthesis of 2D Metal Carbides and Nitrides (MXenes) and Their Precursors with Positive and Unlabeled Machine Learning

Explainable Synthesizability Prediction of Inorganic Crystal Structures using Large Language Models

Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?

Network analysis of synthesizable materials discovery

Predicting and Accelerating Nanomaterials Synthesis Using Machine Learning Featurization

From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction