Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Shengchao Liu,Weili Nie,Chengpeng Wang,Jiarui Lu,Zhuoran Qiao,Ling Liu,Jian Tang,Chaowei Xiao,Anima Anandkumar

2024-01-30

Abstract:There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.

Machine Learning,Computation and Language,Quantitative Methods

What problem does this paper attempt to address?

The paper aims to address the following issues: 1. **Integrating Text Information**: Existing machine learning methods primarily focus on the chemical structure of molecules, neglecting the vast amount of available chemical text knowledge. By incorporating text information, new drug design objectives can be achieved, text-based instructions can be adapted, and complex biological activities can be predicted. 2. **Multimodal Representation**: A multimodal molecule structure-text model (MoleculeSTM) is proposed, which jointly learns the chemical structure and text description of molecules through a contrastive learning strategy. This model demonstrates state-of-the-art generalization capabilities to new biochemical concepts across different benchmarks. 3. **Large-Scale Dataset Construction**: To train MoleculeSTM, the authors constructed a large-scale multimodal dataset, PubChemSTM, containing over 280,000 chemical structure-text pairs. 4. **Zero-Shot Task Validation**: Two challenging zero-shot tasks were designed to validate the effectiveness and practicality of MoleculeSTM, including structure-text retrieval and molecule editing tasks. MoleculeSTM exhibited superior performance in these tasks. 5. **Open Vocabulary and Compositionality**: MoleculeSTM possesses the characteristics of an open vocabulary and compositionality, enabling the exploration of a wide range of biochemical concepts and the optimization of complex molecular properties under natural language descriptions. In summary, the goal of this paper is to enhance the multimodal representation capabilities and zero-shot generalization ability in the field of drug discovery by integrating chemical structure and text knowledge.

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

Multi-modal chemical information reconstruction from images and texts for exploring the near-drug space

Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Towards 3D Molecule-Text Interpretation in Language Models

MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild

Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Interactive Molecular Discovery with Natural Language

MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures

MolScribe: Robust Molecular Structure Recognition with Image-To-Graph Generation

IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System

Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge

MolLM : a unified language model for integrating biomedical text with 2D and 3D molecular representations

MolFM: A Multimodal Molecular Foundation Model

MolCloze - A Unified Cloze-style Self-supervised Molecular Structure Learning Model for Chemical Property Prediction.