Investigating the Reliability and Interpretability of Machine Learning Frameworks for Chemical Retrosynthesis

Friedrich Hastedt,Rowan M. Bailey,Klaus Hellgardt,Sophia N. Yaliraki,Ehecatl Antonio del Rio Chanona,Dongda Zhang
DOI: https://doi.org/10.26434/chemrxiv-2024-qdgnv-v3
2024-02-12
Abstract:Machine learning models for chemical retrosynthesis have attracted substantial interest in recent years. Unaddressed challenges, particularly the absence of robust evaluation metrics for performance comparison, and the lack of black-box interpretability, obscure model limitations and impede progress in the field. We present an automated benchmarking pipeline designed for effective model performance comparisons. With an emphasis on user-friendly design, we aim to streamline accessibility and facilitate utilisation within the research community. Additionally, we suggest and perform a new interpretability study to uncover the degree of chemical understanding acquired by retrosynthesis models. Our results reveal that frameworks based on chemical reaction rules yield the most diverse, chemically valid, and feasible reactions, whereas purely data-driven frameworks suffer from unfeasible and invalid predictions. The interpretability study emphasises that incorporating reaction rules not only enhances model performance but also improves interpretability. For simple molecules, we demonstrate that Graph Neural Networks identify relevant functional groups within the product molecule, providing thermodynamic stabilisation over the reactant precursors. In contrast, the popular Transformer fails to identify such crucial stabilisation. As the molecule and reaction mechanism grow more complex, both data-driven models propose unfeasible disconnections without offering a chemical rationale. We stress the importance of incorporating chemically meaningful descriptors within deep-learning models. Our study provides valuable guidance for the future development of retrosynthesis frameworks.
Chemistry
What problem does this paper attempt to address?
This paper mainly discusses the reliability and interpretability issues of machine learning frameworks in chemical retrosynthesis. Currently, although machine learning models for retrosynthesis have received wide attention, the lack of robust evaluation metrics for performance comparison and the black-box nature of the models hinder the progress in this field. The researchers propose an automated benchmarking pipeline to effectively compare the performance of different models and emphasize user-friendliness to promote the utilization in the research community. In addition, they also conducted a novel study on interpretability to reveal the level of chemical understanding of retrosynthesis models. The research findings indicate that rule-based frameworks based on chemical reaction rules can generate the most diverse, chemically valid, and feasible reactions, while purely data-driven frameworks tend to produce infeasible and ineffective predictions. The interpretability study highlights that incorporating reaction rules not only improves model performance but also enhances interpretability. For simple molecules, graph neural networks can identify relevant functional groups in the product molecules and provide thermodynamic stability, while the popular Transformer failed to recognize this critical stability. As the complexity of molecules and reaction mechanisms increases, both data-driven models propose unreasonable fragmentation without providing chemical justifications. The paper emphasizes the importance of incorporating descriptors with chemical meanings into deep learning models, which provides valuable guidance for the future development of retrosynthesis frameworks. Keywords include chemical retrosynthesis, machine learning, interpretable artificial intelligence, graph neural networks, and Transformer. The remaining parts of the paper provide detailed explanations of the methods, results, and future research directions.