Simple User-Friendly Reaction Format

Alex T. Müller,David F. Nippa,Kenneth Atz,David B. Konrad,Uwe Grether,Rainer E. Martin,Gisbert Schneider
DOI: https://doi.org/10.26434/chemrxiv-2023-nfq7h-v2
2024-05-07
Abstract:Leveraging the increasing volume of chemical reaction data can enhance synthesis planning and improve suc- cess rates. However, machine learning applications for retrosynthesis planning and forward reaction prediction tools depend on having readily available, high-quality data in a structured format. While some public and licensed reaction databases are available, they frequently lack essential information about reaction condi- tions. To address this issue and promote the principles of findable, accessible, interoperable, and reusable (FAIR) data reporting and sharing, we introduce the Simple User-Friendly Reaction Format (SURF). SURF standardizes the documentation of reaction data through a structured tabular format, requiring only a basic understanding of spreadsheets. This format enables chemists to record the synthesis of molecules in a format that is both human- and machine-readable, making it easier to share and integrate directly into machine- learning pipelines. SURF files are designed to be interoperable, easily imported into relational databases, and convertible into other formats. This complements existing initiatives like the Open Reaction Database (ORD) and Unified Data Model (UDM). At Roche, SURF plays a crucial role in democratizing FAIR reaction data sharing and expediting the chemical synthesis process.
Chemistry
What problem does this paper attempt to address?
The problem addressed in this paper is how to effectively utilize and standardize chemical reaction data to facilitate synthesis planning and improve success rates. The current challenge is that, despite the existence of some public and licensed reaction databases, they often lack crucial information about reaction conditions. In order to solve this problem and promote the discoverability, accessibility, interoperability, and reusability (FAIR) principles of data, this paper introduces the Simple User-friendly Reaction Format (SURF). SURF standardizes the recording of reaction data in a structured tabular format, allowing chemists to document molecular synthesis in a way that is understandable by both humans and machines, facilitating seamless data sharing and direct integration into machine learning pipelines. SURF is designed to be interoperable, easily importable into relational databases, and convertible to other formats, complementing existing initiatives such as the Open Reaction Database (ORD) and the Unified Data Model (UDM). SURF has played a crucial role in Roche, promoting the democratization of FAIR reaction data sharing and accelerating the chemical synthesis process.