Abstract:Molecule discovery is a pivotal research field, impacting everything from the medicines we take to the materials we use. Recently, Large Language Models (LLMs) have been widely adopted in molecule understanding and generation, yet the alignments between molecules and their corresponding captions remain a significant challenge. Previous endeavours often treat the molecule as a general SMILES string or molecular graph, neglecting the fine-grained alignments between the molecular sub-structures and the descriptive textual phrases, which are crucial for accurate and explainable predictions. In this case, we introduce MolReFlect, a novel teacher-student framework designed to contextually perform the molecule-caption alignments in a fine-grained way. Our approach initially leverages a larger teacher LLM to label the detailed alignments by directly extracting critical phrases from molecule captions or SMILES strings and implying them to corresponding sub-structures or characteristics. To refine these alignments, we propose In-Context Selective Reflection, which retrieves previous extraction results as context examples for teacher LLM to reflect and lets a smaller student LLM select from in-context reflection and previous extraction results. Finally, we enhance the learning process of the student LLM through Chain-of-Thought In-Context Molecule Tuning, integrating the fine-grained alignments and the reasoning processes within the Chain-of-Thought format. Our experimental results demonstrate that MolReFlect enables LLMs like Mistral-7B to significantly outperform the previous baselines, achieving SOTA performance on the ChEBI-20 dataset. This advancement not only enhances the generative capabilities of LLMs in the molecule-caption translation task, but also contributes to a more explainable framework.

What problem does this paper attempt to address?

This paper attempts to solve the problem of fine - grained alignment between molecules and texts. Specifically, existing methods usually regard molecules as general SMILES strings or molecular graphs, while ignoring the fine - grained alignment between molecular sub - structures and descriptive text phrases, which is crucial in accurate and interpretable predictions. The paper introduces a new framework named MolReFlect, which conducts fine - grained alignment of molecule - text in context through a teacher - student architecture, aiming to improve the performance of molecule - text translation tasks. ### Main Contributions 1. **Fine - grained Alignment**: MolReFlect explores the fine - grained alignment between molecules and texts without manual annotation, providing a new solution to alleviate the data - hungry problem in the biochemistry field. 2. **Interpretability**: By integrating fine - grained alignment into the fine - tuning process of LLM, MolReFlect helps to build a more interpretable framework, helping LLM better understand the translation process between molecules and texts. 3. **Performance Improvement**: MolReFlect achieves state - of - the - art performance in molecule - text translation tasks without introducing additional modalities and complex structures, further demonstrating the importance of fine - grained alignment in context. ### Method Overview The MolReFlect framework consists of three main stages: 1. **Zero - sample Alignment Extraction**: Use a larger teacher LLM to extract key phrases from molecular SMILES representations or molecular descriptions and align them with corresponding features or sub - structure patterns. 2. **Context - selective Reflection**: Retrieve similar samples and their corresponding zero - sample alignments as context examples, let the teacher LLM reflect on and optimize its response. The smaller student LLM selects zero - sample alignment or the alignment after reflection according to the perplexity. 3. **Chain - of - thought Contextual Molecule Fine - tuning**: Re - format the context examples into input - alignment - target chains of thought and use the reasoning ability of LLM for fine - tuning. ### Experimental Results The experimental results show that MolReFlect significantly outperforms the baseline methods in the molecule - text translation tasks on the ChEBI - 20 dataset, especially in the molecular description generation (Mol2Cap) and text - based de novo molecule generation (Cap2Mol) tasks. Specifically, it is manifested in the following aspects: - **Mol2Cap Task**: MolReFlect obtains the highest scores on all evaluation metrics, especially with the BLEU - 2 and BLEU - 4 scores increased by 3.8% and 4.6% respectively. - **Cap2Mol Task**: MolReFlect performs excellently in BLEU scores, exact match rates and molecular fingerprint scores, and the generated molecules are highly similar to the real values. ### Ablation Study The ablation study shows that fine - grained alignment does improve the performance in molecule - text translation tasks. After removing context examples and fine - grained alignment, the performance drops significantly, indicating the importance of fine - grained alignment for improving the final generation quality. In conclusion, MolReFlect significantly improves the performance of molecule - text translation tasks through fine - grained alignment and contextual learning, providing new ideas and tools for future molecular discovery research.

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

An Image-enhanced Molecular Graph Representation Learning Framework

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation

Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Large Language Models are In-Context Molecule Learners

Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval

Towards 3D Molecule-Text Interpretation in Language Models

Chemical-Reaction-Aware Molecule Representation Learning

MolTC: Towards Molecular Relational Modeling In Language Models

MolLM : a unified language model for integrating biomedical text with 2D and 3D molecular representations

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

Molecular contrastive learning of representations via graph neural networks

FineMolTex: Towards Fine-grained Molecular Graph-Text Pre-training

Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models