Automated electrosynthesis reaction mining with multimodal large language models (MLLMs)

Shi Xuan Leong,Sergio Pablo-García,Zijian Zhang,Alán Aspuru-Guzik
DOI: https://doi.org/10.26434/chemrxiv-2024-7fwxv
2024-07-11
Abstract:Leveraging the chemical data that is available in legacy formats such as publications and patents is a significant challenge for the community. Automated reaction mining offers a promising solution to unleash this knowledge into a learnable digital form and therefore help expedite materials and reaction discovery. However, existing reaction mining toolkits are limited to single input modalities (text or images) and cannot effectively integrate heterogeneous data that is scattered across different modalities including text, tables, and figures. In this work, we go beyond single input modalities and explore multimodal large language models (MLLMs) for the analysis of diverse data inputs for automated electrosynthesis reaction mining. We compiled a test dataset of 65 articles and employed it to benchmark five prominent MLLMs against two critical tasks: (i) reaction diagram parsing and (ii) resolving cross-modality data interdependencies. The frontrunner MLLM achieved ≥ 96% accuracy in both tasks, with the strategic integration of single-shot visual prompts and image pre-processing techniques. We integrate this capability into a toolkit named MERMES (Multimodal Reaction Mining pipeline for ElectroSynthesis). Our toolkit functions as an end-to-end MLLM-powered pipeline that integrates article retrieval, information extraction and multimodal analysis for streamlining and automating knowledge extraction. This work lays the groundwork for the increased utilization of MLLMs to accelerate the digitization of chemistry knowledge for data-driven research.
Chemistry
What problem does this paper attempt to address?
The paper attempts to address the problem of how to utilize Multimodal Large Language Models (MLLMs) to automatically extract key information from electrochemical synthesis reactions. Specifically, the main challenges faced by the researchers include: 1. **Digitization of Chemical Knowledge**: A large amount of existing chemical knowledge is in traditional formats (such as HTML and PDF files in publications and patents), which are difficult to use directly for data-driven research. Automated reaction mining can transform this knowledge into a learnable digital form, thereby accelerating the discovery of materials and reactions. 2. **Limitations of Unimodal Tools**: Existing reaction mining tools mainly rely on a single input mode (text or image) and cannot effectively integrate heterogeneous data scattered across different modes (text, tables, and images). 3. **Complexity and Diversity of Data**: Chemical reaction conditions are usually dispersed in different parts of the literature (such as the main text, supplementary materials, tables, charts, and textual descriptions) and are often overwhelmed by a large amount of irrelevant content, making it difficult to accurately extract experimental data records. To address these issues, the researchers developed a Multimodal Large Language Model (MLLMs) driven tool named MERMES (Multimodal Reaction Mining pipeline for Electro Synthesis). This tool achieves its goal through the following two key tasks: 1. **Reaction Graph Parsing**: Extracting reaction conditions from reaction graphs and classifying them into 10 different categories (such as anode, cathode, electrolyte/additive, solvent, etc.). 2. **Cross-Modal Data Dependency Parsing**: Identifying footnote labels in charts and associating them with definitions in the text to ensure data consistency and integrity. Through these methods, MERMES can effectively integrate and process multimodal information from scientific literature, thereby achieving automated reaction mining. This lays the foundation for comprehensive digitization of chemical knowledge and data-driven research.