Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning

Sakhinana Sagar Srinivas,Venkataramana Runkana
2024-08-27
Abstract:In the field of chemistry, the objective is to create novel molecules with desired properties, facilitating accurate property predictions for applications such as material design and drug screening. However, existing graph deep learning methods face limitations that curb their expressive power. To address this, we explore the integration of vast molecular domain knowledge from Large Language Models (LLMs) with the complementary strengths of Graph Neural Networks (GNNs) to enhance performance in property prediction tasks. We introduce a Multi-Modal Fusion (MMF) framework that synergistically harnesses the analytical prowess of GNNs and the linguistic generative and predictive abilities of LLMs, thereby improving accuracy and robustness in predicting molecular properties. Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting. Furthermore, our approach effectively addresses distributional shifts, a common challenge in real-world applications, and showcases the efficacy of learning cross-modal representations, surpassing state-of-the-art baselines on benchmark datasets for property prediction tasks.
Machine Learning
What problem does this paper attempt to address?
The paper aims to address the problem of chemical property prediction, particularly the accurate prediction of molecular properties in applications such as material design and drug screening. Existing graph deep learning methods have limitations in expressive power. To this end, the authors explore a method that combines the advantages of large language models (LLMs) and graph neural networks (GNNs). Specifically, the paper proposes a multi-modal fusion framework (MMF) that combines the powerful modeling capabilities of GNNs for graph-structured data with the zero-shot and few-shot learning capabilities of LLMs to improve the accuracy and robustness of molecular property prediction. The MMF framework achieves this goal through the following steps: 1. **Multi-modal Semantic Fusion**: A five-step method is used to generate cross-modal embeddings, including using LLMs to generate technical descriptions of chemical SMILES representations and calculating context-aware text embeddings by fine-tuning small language models (LMs). 2. **In-context Learning (ICL)**: LLMs are guided to predict molecular properties through a few input-output examples, generating prediction embeddings without explicit fine-tuning. 3. **Mixture of Experts (MoE)**: A gating mechanism is used to integrate cross-modal embeddings and prediction embeddings, optimizing the unified embedding for high-accuracy predictions. Experimental results show that the MMF framework performs excellently on multiple public molecular property prediction datasets, significantly improving prediction accuracy and reducing the risk of overfitting. This research provides strong support for the advancement of molecular science and technology.