Can Large Language Models Empower Molecular Property Prediction?

Chen Qian,Huayi Tang,Zhirui Yang,Hong Liang,Yong Liu

2023-07-15

Abstract:Molecular property prediction has gained significant attention due to its transformative potential in multiple scientific disciplines. Conventionally, a molecule graph can be represented either as a graph-structured data or a SMILES text. Recently, the rapid development of Large Language Models (LLMs) has revolutionized the field of NLP. Although it is natural to utilize LLMs to assist in understanding molecules represented by SMILES, the exploration of how LLMs will impact molecular property prediction is still in its early stage. In this work, we advance towards this objective through two perspectives: zero/few-shot molecular classification, and using the new explanations generated by LLMs as representations of molecules. To be specific, we first prompt LLMs to do in-context molecular classification and evaluate their performance. After that, we employ LLMs to generate semantically enriched explanations for the original SMILES and then leverage that to fine-tune a small-scale LM model for multiple downstream tasks. The experimental results highlight the superiority of text explanations as molecular representations across multiple benchmark datasets, and confirm the immense potential of LLMs in molecular property prediction tasks. Codes are available at \url{<a class="link-external link-https" href="https://github.com/ChnQ/LLM4Mol" rel="external noopener nofollow">this https URL</a>}.

Machine Learning,Artificial Intelligence,Quantitative Methods

What problem does this paper attempt to address?

The paper primarily explores the potential application of large language models (LLMs) in molecular property prediction tasks. Specifically, the authors investigate from two perspectives: 1. **Zero-shot/Few-shot Molecular Classification**: Utilizing the powerful contextual learning capabilities of large language models, by designing appropriate prompts, the model can directly classify molecules without the need for additional parameter updates. 2. **Generating New Molecular Representations**: By having large language models generate detailed textual descriptions for the Simplified Molecular Input Line Entry System (SMILES) representation of molecules (referred to as "Caption as new Representation," abbreviated as CaR), these descriptions include information about the functional groups and chemical properties of the molecules. These descriptions are then used as new representations of the molecules to assist downstream tasks. Experimental results show that on multiple benchmark datasets, this new method achieves significantly better performance compared to traditional Graph Neural Networks (GNNs) and SMILES-based methods under random split settings. Additionally, the paper discusses some limitations and future research directions, such as exploring more diverse large language models, better utilizing the graph structure information of molecules, and handling macromolecules that cannot be represented by SMILES.

Can Large Language Models Empower Molecular Property Prediction?

Benchmarking Large Language Models for Molecule Prediction Tasks

Unveiling Molecular Secrets: An LLM-Augmented Linear Model for Explainable and Calibratable Molecular Property Prediction

Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models

Exploring the Potential of Large Language Models in Molecular Tasks: An Insightful Evaluation with GPT‐4

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

Understanding the Limitations of Deep Models for Molecular Property Prediction: Insights and Solutions.

Large-scale chemical language representations capture molecular structure and properties

Large property models: a new generative machine-learning formulation for molecules

Regression with Large Language Models for Materials and Molecular Property Prediction

The Future of Molecular Studies Through the Lens of Large Language Models.

Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective

Large Language Models for Biomolecular Analysis: from Methods to Applications

Can Large Language Models Understand Molecules?

From Words to Molecules: A Survey of Large Language Models in Chemistry

Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science

Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning

Large language model for molecular chemistry

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks