Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

Namkyeong Lee,Siddhartha Laghuvarapu,Chanyoung Park,Jimeng Sun

2024-07-23

Abstract:Recently, there has been a growing interest among researchers in understanding molecules and their textual descriptions through molecule language models (MoLM). However, despite some early promising developments, the advancement of MoLM still trails significantly behind that of vision language models (VLM). This is because unique challenges exist apart from VLM in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Specifically, AMOLE enriches molecule-text pairs by sharing descriptions among structurally similar molecules with a novel structural similarity preserving loss. Moreover, we propose an expertise reconstruction loss to transfer knowledge from molecules that have extensive expertise to those with less expertise. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery. The source code for AMOLE is available at <a class="link-external link-https" href="https://github.com/Namkyeong/AMOLE" rel="external noopener nofollow">this https URL</a>.

Artificial Intelligence

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address two unique challenges faced by Molecular Language Models (MoLM): 1. **Insufficient Data**: Compared to Vision Language Models (VLM), the amount of molecular text-pairing data is limited, which restricts the development of MoLM. Specifically, generating molecular text-pairing requires expensive expertise and a significant amount of wet lab time, making it difficult to increase data volume through web crawling. 2. **Lack of Expertise**: Each molecule is typically described by experts from different fields, but due to varying focuses of these experts, many molecules have incomplete descriptions across different experts. For example, in the PubChem database, only about 27K molecules have descriptions from multiple experts, while the remaining 272K molecules have descriptions from only one expert. To address these issues, the paper proposes the AMOLE method, which includes two main components: - **Structure Similarity Preservation Loss**: This increases molecular text-pairing data by sharing descriptions among molecules with similar structures and introduces a novel loss function to ensure that the enhanced pairings are more accurately aligned in the representation space. - **Expertise Transfer Module**: This leverages existing rich descriptions to infer and supplement the missing expertise for other molecules, thereby improving the model's performance in handling molecules with insufficient detailed descriptions. Through these strategies, AMOLE demonstrates superior performance in various downstream tasks and shows potential application value in real drug discovery processes.

Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Towards 3D Molecule-Text Interpretation in Language Models

MolLM : a unified language model for integrating biomedical text with 2D and 3D molecular representations

MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation

Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

Chemical Language Model Linker: blending text and molecules with modular adapters

Benchmarking Large Language Models for Molecule Prediction Tasks

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Can Large Language Models Empower Molecular Property Prediction?

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

DrugAssist: A Large Language Model for Molecule Optimization

ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation

Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

LLaMo: Large Language Model-based Molecular Graph Assistant

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation