LLM-Prop: Predicting Physical And Electronic Properties Of Crystalline Solids From Their Text Descriptions

Andre Niyongabo Rubungo,Craig Arnold,Barry P. Rand,Adji Bousso Dieng
2023-10-21
Abstract:The prediction of crystal properties plays a crucial role in the crystal design process. Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks (GNNs). Although GNNs are powerful, accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge. Surprisingly, predicting crystal properties from crystal text descriptions is understudied, despite the rich information and expressiveness that text data offer. One of the main reasons is the lack of publicly available data for this task. In this paper, we develop and make public a benchmark dataset (called TextEdge) that contains text descriptions of crystal structures with their properties. We then propose LLM-Prop, a method that leverages the general-purpose learning capabilities of large language models (LLMs) to predict the physical and electronic properties of crystals from their text descriptions. LLM-Prop outperforms the current state-of-the-art GNN-based crystal property predictor by about 4% in predicting band gap, 3% in classifying whether the band gap is direct or indirect, and 66% in predicting unit cell volume. LLM-Prop also outperforms a finetuned MatBERT, a domain-specific pre-trained BERT model, despite having 3 times fewer parameters. Our empirical results may highlight the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction.
Computation and Language,Materials Science
What problem does this paper attempt to address?
The paper aims to address the problem of crystal property prediction. Specifically, existing crystal property prediction methods mainly rely on Graph Neural Networks (GNNs). Although GNNs have strong capabilities in modeling complex interactions between atoms, accurately capturing the intricate relationships between atoms and molecules within crystals remains challenging. Additionally, research on predicting crystal properties through textual descriptions is relatively scarce, primarily due to the lack of publicly available datasets. To address these issues, the paper presents the following contributions: 1. **Dataset**: A benchmark dataset (named TextEdge) has been collected and released, containing approximately 144,000 textual descriptions of crystals along with their physical and electronic properties. 2. **New Method**: A method based on large language models (LLM) is proposed—LLM-Prop, which utilizes language models to predict the physical and electronic properties of crystals from their textual descriptions. Experimental results show that LLM-Prop outperforms existing GNN methods in bandgap prediction, unit cell volume prediction, and whether the bandgap is direct. The main objective of the paper is to demonstrate the effectiveness of predicting crystal properties from textual descriptions and to prove through experiments that this method performs superiorly in certain tasks.