Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Shengchao Liu,Weili Nie,Chengpeng Wang,Jiarui Lu,Zhuoran Qiao,Ling Liu,Jian Tang,Chaowei Xiao,Anima Anandkumar
2024-01-30
Abstract:There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
Machine Learning,Computation and Language,Quantitative Methods
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Integrating Text Information**: Existing machine learning methods primarily focus on the chemical structure of molecules, neglecting the vast amount of available chemical text knowledge. By incorporating text information, new drug design objectives can be achieved, text-based instructions can be adapted, and complex biological activities can be predicted. 2. **Multimodal Representation**: A multimodal molecule structure-text model (MoleculeSTM) is proposed, which jointly learns the chemical structure and text description of molecules through a contrastive learning strategy. This model demonstrates state-of-the-art generalization capabilities to new biochemical concepts across different benchmarks. 3. **Large-Scale Dataset Construction**: To train MoleculeSTM, the authors constructed a large-scale multimodal dataset, PubChemSTM, containing over 280,000 chemical structure-text pairs. 4. **Zero-Shot Task Validation**: Two challenging zero-shot tasks were designed to validate the effectiveness and practicality of MoleculeSTM, including structure-text retrieval and molecule editing tasks. MoleculeSTM exhibited superior performance in these tasks. 5. **Open Vocabulary and Compositionality**: MoleculeSTM possesses the characteristics of an open vocabulary and compositionality, enabling the exploration of a wide range of biochemical concepts and the optimization of complex molecular properties under natural language descriptions. In summary, the goal of this paper is to enhance the multimodal representation capabilities and zero-shot generalization ability in the field of drug discovery by integrating chemical structure and text knowledge.