LMM Chemical Research with Document Retrieval

Kevin Kawchak
DOI: https://doi.org/10.26434/chemrxiv-2024-p91gm
2024-08-13
Abstract:Chemical research is more effectively progressed using Large Multimodal Models (LMMs) combined with Document Retrieval and recently published literature. The methods described here illustrate significant strides over previously tested Large Language Model (LLM) multi-document workflows for characterization assistance and generating new reactions. Here, 3.5 Sonnet, ScholarGPT, and ChatGPT 4o LMMs processed either 5 images or 5 supplementary documents from leading 2024 journals. Each of the three models performed inference on a detailed prompt to produce a response that included context from attachments. In addition, the LMMs were not provided with which of the 5 files contained the answer. The main findings were that 3.5 Sonnet had an average score of 9.8 for images, while two judges awarded high scores to ChatGPT 4o (9.7, 9.4) and ScholarGPT (9.5, 9.4) for document analysis. Judging was performed by a human evaluator for the image uploads, with document processing evaluated by Llama 3.1 405B and Nemotron 4 340B LLMs which correlated well and improved explainability. Highlights include 3.5 Sonnet's ability to interpret a Two-dimensional Nuclear Magnetic Resonance (2D NMR) spectrum accurately, along with Judge Llama 3.1's ability to provide consistent formatted scores with explanations. The results shown here help illustrate AI's continued revitalization of the established chemical research field.
Chemistry
What problem does this paper attempt to address?
The paper primarily explores the application of Large Multimodal Models (LMMs) combined with document retrieval technology in chemical research. The goal of the study is to evaluate the effectiveness and accuracy of LMMs in handling chemical images and documents, particularly in assisting chemical characterization and generating new reactions. Specifically, the paper addresses its research objectives through the following points: 1. **Comparing the performance of different LMMs**: By having three different LMMs (3.5 Sonnet, ScholarGPT, and ChatGPT 4o) process five images or five supplementary documents selected from top chemical journals, the study assesses the performance of these models in solving chemical problems. 2. **Refining evaluation criteria**: To ensure the accuracy of the evaluation, the researchers developed detailed evaluation criteria, including whether the correct contextual information was provided, whether the correct image source was identified, and the accuracy of the generated results. 3. **Improving interpretability**: By using workflows such as Retrieval Augmented Generation (RAG) and Direct Document Retrieval (DR), the interpretability of the model outputs was enhanced. 4. **Quantifying performance**: The performance of different models was measured using quantitative scoring methods. For example, 3.5 Sonnet received a high average score of 9.8 in image analysis, while ChatGPT 4o and ScholarGPT also achieved near-perfect scores in document analysis. In summary, this paper aims to experimentally verify the potential of LMMs in the field of chemical research and reveal how these models can effectively assist chemists in performing complex analytical tasks.