ChemDFM-X: Towards Large Multimodal Model for Chemistry

Zihan Zhao,Bo Chen,Jingpiao Li,Lu Chen,Liyang Wen,Pengyu Wang,Zichen Zhu,Danyang Zhang,Ziping Wan,Yansi Li,Zhongyang Dai,Xin Chen,Kai Yu

2024-09-20

Abstract:Rapid developments of AI tools are expected to offer unprecedented assistance to the research of natural science including chemistry. However, neither existing unimodal task-specific specialist models nor emerging general large multimodal models (LMM) can cover the wide range of chemical data modality and task categories. To address the real demands of chemists, a cross-modal Chemical General Intelligence (CGI) system, which serves as a truly practical and useful research assistant utilizing the great potential of LMMs, is in great need. In this work, we introduce the first Cross-modal Dialogue Foundation Model for Chemistry (ChemDFM-X). Diverse multimodal data are generated from an initial modality by approximate calculations and task-specific model predictions. This strategy creates sufficient chemical training corpora, while significantly reducing excessive expense, resulting in an instruction-tuning dataset containing 7.6M data. After instruction finetuning, ChemDFM-X is evaluated on extensive experiments of different chemical tasks with various data modalities. The results demonstrate the capacity of ChemDFM-X for multimodal and inter-modal knowledge comprehension. ChemDFM-X marks a significant milestone toward aligning all modalities in chemistry, a step closer to CGI.

Machine Learning,Computation and Language,Multimedia

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in chemical research, the existing single - modal task - specific models or the emerging general - purpose large - scale multimodal models (LMM) cannot cover the wide range of modalities and task categories of chemical data. Specifically, chemical data encompasses multiple modalities, ranging from text descriptions, molecular structures to images and spectra, and chemical tasks also include various forms such as property prediction and retrosynthetic analysis. Although these single - modal specialized models can achieve state - of - the - art performance in their respective tasks, they are essentially unable to handle slightly different tasks or cope with corresponding tasks when the input modality is slightly changed. Therefore, the practical utility and auxiliary ability of these models in research and manufacturing are limited. To meet the practical needs of chemists, there is an urgent need for a cross - modal Chemical General Intelligence (CGI) system that can utilize the great potential of large - scale multimodal models (LMM) as a truly practical and useful research assistant. For this purpose, the authors propose the first cross - modal dialogue - based model, ChemDFM - X, which aims to understand and interpret data of multiple chemical modalities and complete multiple downstream tasks while using the same set of model weights. In this way, ChemDFM - X demonstrates its ability to understand multimodal and cross - modal knowledge in a wide range of experiments on different chemical tasks, marking an important step towards the alignment of all modalities in chemistry and getting closer to the realization of CGI.

ChemDFM-X: Towards Large Multimodal Model for Chemistry

ChemDFM: A Large Language Foundation Model for Chemistry

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

A Foundation Model for Chemical Design and Property Prediction

ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models

Asymmetric Contrastive Multimodal Learning for Advancing Chemical Understanding

SciDFM: A Large Language Model with Mixture-of-Experts for Science

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

ChatMolData: a Multimodal Agent for Automatic Molecular Data Processing

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties

Interactive Molecular Discovery with Natural Language

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

ChatMol: Interactive Molecular Discovery with Natural Language

Probing the limitations of multimodal language models for chemistry and materials research

InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

BatGPT-Chem: A Foundation Large Model For Chemical Engineering