MolFM: A Multimodal Molecular Foundation Model

Yizhen Luo,Kai Yang,Massimo Hong,Xing Yi Liu,Zaiqing Nie
2023-07-21
Abstract:Molecular knowledge resides within three different modalities of information sources: molecular structures, biomedical documents, and knowledge bases. Effective incorporation of molecular knowledge from these modalities holds paramount significance in facilitating biomedical research. However, existing multimodal molecular foundation models exhibit limitations in capturing intricate connections between molecular structures and texts, and more importantly, none of them attempt to leverage a wealth of molecular expertise derived from knowledge graphs. In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs. We propose cross-modal attention between atoms of molecular structures, neighbors of molecule entities and semantically related texts to facilitate cross-modal comprehension. We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule, as well as molecules sharing similar structures or functions. MolFM achieves state-of-the-art performance on various downstream tasks. On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively. Furthermore, qualitative analysis showcases MolFM's implicit ability to provide grounding from molecular substructures and knowledge graphs. Code and models are available on <a class="link-external link-https" href="https://github.com/BioFM/OpenBioMed" rel="external noopener nofollow">this https URL</a>.
Biomolecules,Computational Engineering, Finance, and Science,Machine Learning,Chemical Physics
What problem does this paper attempt to address?
The paper aims to address the limitations of multimodal molecular foundation models in biomedical research, particularly the shortcomings of existing models in capturing the complex relationships between molecular structures and text, as well as utilizing global knowledge from knowledge graphs. Specifically, the paper proposes a new model named MolFM, which addresses these issues through the following methods: 1. **Cross-modal Attention Mechanism**: MolFM introduces a cross-modal attention mechanism, enabling the model to better understand the connections between molecular structures, biomedical texts, and knowledge graphs. 2. **Joint Representation Learning**: MolFM is designed for joint representation learning from molecular structures, biomedical texts, and knowledge graphs, thereby providing a more comprehensive understanding of molecular information. 3. **Theoretical Proof**: The authors provide theoretical analysis showing that their pre-training method can minimize the distance in the feature space between different modalities of the same molecule, as well as between molecules with similar structures or functions. Through these improvements, MolFM achieves significant performance enhancements in various downstream tasks (such as cross-modal retrieval, molecular description generation, text-based molecular generation, and molecular property prediction), demonstrating its broad applicability and effectiveness in the biomedical field.