Assessing the Extrapolation Capability of Template-Free Retrosynthesis Models

Shuan Chen,Yousung Jung
2024-02-29
Abstract:Despite the acknowledged capability of template-free models in exploring unseen reaction spaces compared to template-based models for retrosynthesis prediction, their ability to venture beyond established boundaries remains relatively uncharted. In this study, we empirically assess the extrapolation capability of state-of-the-art template-free models by meticulously assembling an extensive set of out-of-distribution (OOD) reactions. Our findings demonstrate that while template-free models exhibit potential in predicting precursors with novel synthesis rules, their top-10 exact-match accuracy in OOD reactions is strikingly modest (< 1%). Furthermore, despite the capability of generating novel reactions, our investigation highlights a recurring issue where more than half of the novel reactions predicted by template-free models are chemically implausible. Consequently, we advocate for the future development of template-free models that integrate considerations of chemical feasibility when navigating unexplored regions of reaction space.
Chemical Physics,Machine Learning
What problem does this paper attempt to address?
This paper evaluates the predictive capability of template-free retrosynthesis models in predicting out-of-training-data-range reactions (i.e. out-of-distribution reactions). The study found that although these models can predict precursors with new synthesis rules, their accuracy in predicting exact matches of out-of-distribution reactions is very low (less than 1%). Furthermore, more than half of the predicted novel reactions were proven to be chemically infeasible. The paper empirically tests these models by carefully constructing a set of out-of-distribution reactions from different datasets and proposes that future development of template-free models should consider chemical feasibility when exploring unknown reaction space to improve prediction effectiveness and accuracy. The research results highlight the delicate balance between the capability, limitations, and practical application of template-free models in retrosynthesis prediction, calling for further improvement of the models to ensure chemical rationality.