(Semi-) Automatic Review Process for Common Compound Characterization Data in Organic Synthesis

Nicole Jung,Stefan Bräse,Pei-Chi Huang,Chia-Lin Lin,Pierre Tremouilhac,Yu-Chieh Huang,Nils Schlörer,Stefan Kuhn,Markus Götz,Oskar Taubert
DOI: https://doi.org/10.26434/chemrxiv-2024-1r9tb
2024-02-28
Abstract:A method for data review in chemical sciences with a focus on data for the characterization of synthetic molecules is described. As current procedures for data curation in chemistry rely almost exclusively on manual checking or peer reviewing, a (semi-)automatic procedure for the evaluation of data assigned to molecular structures is proposed and demonstrated. The information usually required for the identification of isolated compounds is used to clarify whether the data is complete with respect to the available data types and metadata, if it is consistent with the proposed structure and if it is plausible in comparison to simulated data. Spectra prediction and automatic signal comparison are applied to NMR evaluation, mass spectrometry data are evaluated by signal extraction, and machine learning is used for IR analysis. The proposed protocol shows how an integration of different tools for data analysis can help to overcome the challenges of the currently purely manual reviewing and curation efforts for data in synthetic chemistry.
Chemistry
What problem does this paper attempt to address?