LMM Spectrometric Determination of an Organic Compound

Kevin Kawchak
DOI: https://doi.org/10.26434/chemrxiv-2024-qtnkj
2024-08-28
Abstract:Many machine learning models used in academia and industry that identify organic compounds typically lack the ability to converse over prompts and results, and also require expertise across a number of steps to obtain answers. The purpose of this study was primarily to gain insight into the advantages of current unmodified state of the art Large Multimodal Models (LMMs) across several prompts containing multiple spectra of varying difficulty to evaluate the impact of training data, reasoning, and speed. These readily available and easy to use software for the identification of an organic compound based on a molecular formula and spectra were found to be reproducible across three similar LMMs. To the author's best knowledge, this marks the first time that three GPT variants were each able to correctly identify the organic compound quinoline using a variety of different spectroscopic images. The results were obtained using a 2-step process consisting of a) Uploading high resolution spectral images, and b) Submitting a text prompt with the images that requested a compound determination. The main findings were that 1) Four LMMs provided rationale step-by-step interpretations of 1H-NMR, 13C-NMR, and 3 DEPT-NMR spectra from Prompt A, 2) Three of these LMMs, led by a GPT-5 preview model, combined these interpretations into the correct chemical structure with Prompt A, and 3) Two of these LMMs achieved a top score of 5/5 for also generating sequential explanations reflecting the order of the provided spectra along with most of the correct spectral and molecular formula explanations.
Chemistry
What problem does this paper attempt to address?