Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

Yi Xiao,Xiangxin Zhou,Qiang Liu,Liang Wang

2024-03-07

Abstract:Artificial intelligence has demonstrated immense potential in scientific research. Within molecular science, it is revolutionizing the traditional computer-aided paradigm, ushering in a new era of deep learning. With recent progress in multimodal learning and natural language processing, an emerging trend has targeted at building multimodal frameworks to jointly model molecules with textual domain knowledge. In this paper, we present the first systematic survey on multimodal frameworks for molecules research. Specifically,we begin with the development of molecular deep learning and point out the necessity to involve textual modality. Next, we focus on recent advances in text-molecule alignment methods, categorizing current models into two groups based on their architectures and listing relevant pre-training tasks. Furthermore, we delves into the utilization of large language models and prompting techniques for molecular tasks and present significant applications in drug discovery. Finally, we discuss the limitations in this field and highlight several promising directions for future research.

Biomolecules,Computation and Language,Machine Learning

What problem does this paper attempt to address?

This paper focuses on integrating text and molecular information to construct a multimodal framework for enhancing molecular research, particularly in the field of drug discovery. Traditional computer-aided methods in molecular science have been revolutionized by deep learning, but existing deep learning models have limited understanding of chemical knowledge and rely on annotated data. The paper proposes that recent advancements in multimodal learning and natural language processing provide new insights into establishing connections between text and molecules. The authors present two main approaches: one considers molecules as a language with special grammar and utilizes cross-lingual models to simultaneously process text and molecules; the other explores potential alignment between text and structured molecular data and integrates large-scale language models into the multimodal framework for cross-modal molecular task prediction. Additionally, the paper mentions the application of prompt engineering techniques during the training process, which enables good results in many molecular tasks without requiring a large amount of pretraining data. The paper categorizes current work, discusses training strategies, dataset construction methods, and relevant applications, and analyzes the limitations of this field, pointing out several promising directions for future research. Overall, this paper is the first systematic survey on the multimodal framework in molecular research, aiming to summarize recent progress and propose future research prospects.

Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Towards 3D Molecule-Text Interpretation in Language Models

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals

Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

Interactive Molecular Discovery with Natural Language

Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval

MolFM: A Multimodal Molecular Foundation Model

MolLM : a unified language model for integrating biomedical text with 2D and 3D molecular representations

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

MolBind: Multimodal Alignment of Language, Molecules, and Proteins

MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

Molecular Joint Representation Learning via Multi-modal Information

A Survey on Image-text Multimodal Models

Molecular Joint Representation Learning via Multi-modal Information of SMILES and Graphs