MolFM: A Multimodal Molecular Foundation Model

Yizhen Luo,Kai Yang,Massimo Hong,Xing Yi Liu,Zaiqing Nie

2023-07-21

Abstract:Molecular knowledge resides within three different modalities of information sources: molecular structures, biomedical documents, and knowledge bases. Effective incorporation of molecular knowledge from these modalities holds paramount significance in facilitating biomedical research. However, existing multimodal molecular foundation models exhibit limitations in capturing intricate connections between molecular structures and texts, and more importantly, none of them attempt to leverage a wealth of molecular expertise derived from knowledge graphs. In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs. We propose cross-modal attention between atoms of molecular structures, neighbors of molecule entities and semantically related texts to facilitate cross-modal comprehension. We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule, as well as molecules sharing similar structures or functions. MolFM achieves state-of-the-art performance on various downstream tasks. On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively. Furthermore, qualitative analysis showcases MolFM's implicit ability to provide grounding from molecular substructures and knowledge graphs. Code and models are available on <a class="link-external link-https" href="https://github.com/BioFM/OpenBioMed" rel="external noopener nofollow">this https URL</a>.

Biomolecules,Computational Engineering, Finance, and Science,Machine Learning,Chemical Physics

What problem does this paper attempt to address?

The paper aims to address the limitations of multimodal molecular foundation models in biomedical research, particularly the shortcomings of existing models in capturing the complex relationships between molecular structures and text, as well as utilizing global knowledge from knowledge graphs. Specifically, the paper proposes a new model named MolFM, which addresses these issues through the following methods: 1. **Cross-modal Attention Mechanism**: MolFM introduces a cross-modal attention mechanism, enabling the model to better understand the connections between molecular structures, biomedical texts, and knowledge graphs. 2. **Joint Representation Learning**: MolFM is designed for joint representation learning from molecular structures, biomedical texts, and knowledge graphs, thereby providing a more comprehensive understanding of molecular information. 3. **Theoretical Proof**: The authors provide theoretical analysis showing that their pre-training method can minimize the distance in the feature space between different modalities of the same molecule, as well as between molecules with similar structures or functions. Through these improvements, MolFM achieves significant performance enhancements in various downstream tasks (such as cross-modal retrieval, molecular description generation, text-based molecular generation, and molecular property prediction), demonstrating its broad applicability and effectiveness in the biomedical field.

MolFM: A Multimodal Molecular Foundation Model

MolLM : a unified language model for integrating biomedical text with 2D and 3D molecular representations

MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge

MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures

FineMolTex: Towards Fine-grained Molecular Graph-Text Pre-training

$\texttt{MiniMol}$: A Parameter-Efficient Foundation Model for Molecular Learning

Multi-view biomedical foundation models for molecule-target and property prediction

MolBind: Multimodal Alignment of Language, Molecules, and Proteins

Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

TransFoxMol: predicting molecular property with focused attention

Pretraining Graph Transformer for Molecular Representation with Fusion of Multimodal Information

Multimodal Fusion with Relational Learning for Molecular Property Prediction

MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning

Relocating a Sense of Place Using the Participatory Geoweb: The Historical Document Database of the Métis Nation of British Columbia