LinChemIn: Route Arithmetic — Operations on Digital Synthetic Routes

Marta Pasquini,Marco Stenta
DOI: https://doi.org/10.26434/chemrxiv-2023-g84vw
2023-11-03
Abstract:Computational tools are revolutionizing our understanding and prediction of chemical reactivity by combining traditional data analysis techniques with new predictive models. These tools extract additional value from the reaction data corpus, but to effectively convert this value into actionable knowledge, domain specialists need to interact easily with the computer-generated output. In this application note, we demonstrate the capabilities of the open-source Python toolkit LinChemIn, which simplifies the manipulation of reaction networks and provides advanced functionality for working with synthetic routes. LinChemIn ensures chemical consistency when merging, editing, mining, and analyzing reaction networks. Its flexible input interface can process routes from various sources, including predictive models and expert input. The toolkit also efficiently extracts individual routes from the combined synthetic tree, identifying alternative paths and reaction combinations. By reducing the operational barrier to accessing and analyzing synthetic routes from multiple sources, LinChemIn facilitates a constructive interplay between Artificial Intelligence and human expertise.
Chemistry
What problem does this paper attempt to address?
This paper focuses on how to simplify and enhance the processing and analysis of synthetic routes to facilitate effective collaboration between artificial intelligence (AI) and human experts. The study introduces an open-source Python toolkit called LinChemIn, which ensures chemical consistency when merging, editing, mining, and analyzing reaction networks. LinChemIn has a flexible input interface to handle synthetic routes from various sources, including predictive models and expert inputs, and is capable of identifying and extracting single pathways or mining alternative paths from the synthesis tree, thus providing diverse strategy choices for synthesis goals. With LinChemIn, scientists can easily edit predicted synthetic routes, add or remove chemical reaction steps, and make adjustments based on additional predictions, literature data, or chemical intuition. In addition, the paper demonstrates how to create a synthesis tree by merging synthetic routes from different sources and how to mine new synthesis paths from it, thereby increasing the efficiency and diversity of designing synthetic routes. The paper showcases the new functionalities of LinChemIn, including route identification, editing, merging, and mining, through a practical case study on the predicted synthetic route of the antiviral drug Amenamevir. These operations contribute to reducing barriers between scientists and AI model outputs, promoting data- and model-driven synthesis route design and selection. In conclusion, this paper addresses the effective integration, editing, and analysis of synthetic routes from multiple sources using the LinChemIn toolkit to enhance the combination of chemical reaction prediction and human expert knowledge, thereby improving the efficiency and innovation of synthesis route design.