LLaMo: Large Language Model-based Molecular Graph Assistant

Jinyoung Park,Minseong Bae,Dohwan Ko,Hyunwoo J. Kim
2024-10-31
Abstract:Large Language Models (LLMs) have demonstrated remarkable generalization and instruction-following capabilities with instruction tuning. The advancements in LLMs and instruction tuning have led to the development of Large Vision-Language Models (LVLMs). However, the competency of the LLMs and instruction tuning have been less explored in the molecular domain. Thus, we propose LLaMo: Large Language Model-based Molecular graph assistant, which is an end-to-end trained large molecular graph-language model. To bridge the discrepancy between the language and graph modalities, we present the multi-level graph projector that transforms graph representations into graph tokens by abstracting the output representations of each GNN layer and motif representations with the cross-attention mechanism. We also introduce machine-generated molecular graph instruction data to instruction-tune the large molecular graph-language model for general-purpose molecule and language understanding. Our extensive experiments demonstrate that LLaMo shows the best performance on diverse tasks, such as molecular description generation, property prediction, and IUPAC name prediction. The code of LLaMo is available at <a class="link-external link-https" href="https://github.com/mlvlab/LLaMo" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Molecular Networks
What problem does this paper attempt to address?
The paper attempts to address the problem of utilizing large language models (LLMs) and instruction tuning techniques to improve the performance of tasks such as molecular description generation, property prediction, and IUPAC name prediction in the molecular graph domain. Specifically, the paper proposes a new framework called LLaMo (Large Language Model-based Molecular Graph Assistant) aimed at addressing the shortcomings of existing methods through the following points: 1. **Multimodal Data Processing**: Existing molecular graph models have limitations in handling tasks involving text and molecular pairs, such as lack of interpretability and multimodal compatibility. LLaMo, by combining a molecular graph encoder and large language models, can better handle these multimodal tasks. 2. **Multi-level Graph Projector**: To bridge the gap between language and graph modalities, the paper introduces a multi-level graph projector that converts graph representations into graph tokens, ensuring the model can comprehensively understand molecular structures. 3. **Machine-generated Instruction Data**: To enhance the model's instruction-following capability, the paper uses a pipeline to convert molecular descriptions and IUPAC names into multi-turn dialogue formats, generating machine-generated molecular graph instruction data. 4. **End-to-end Training**: LLaMo improves the model's performance on various tasks, including molecular description generation, property prediction, and IUPAC name prediction, through end-to-end training. In summary, the main goal of this paper is to design a new framework, LLaMo, that combines a multi-level graph projector and machine-generated instruction data to enhance the performance of large language models in the molecular graph domain.