Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?

Zhilong Song,Shuaihua Lu,Minggang Ju,Qionghua Zhou,Jinlan Wang
2024-07-10
Abstract:Accessing the synthesizability of crystal structures is pivotal for advancing the practical application of theoretical material structures designed by machine learning or high-throughput screening. However, a significant gap exists between the actual synthesizability and thermodynamic or kinetic stability, which is commonly used for screening theoretical structures for experiments. To address this, we develop the Crystal Synthesis Large Language Models (CSLLM) framework, which includes three LLMs for predicting the synthesizability, synthesis methods, and precursors. We create a comprehensive synthesizability dataset including 140,120 crystal structures and develop an efficient text representation method for crystal structures to fine-tune the LLMs. The Synthesizability LLM achieves a remarkable 98.6% accuracy, significantly outperforming traditional synthesizability screening based on thermodynamic and kinetic stability by 106.1% and 44.5%, respectively. The Methods LLM achieves a classification accuracy of 91.02%, and the Precursors LLM has an 80.2% success rate in predicting synthesis precursors. Furthermore, we develop a user-friendly graphical interface that enables automatic predictions of synthesizability and precursors from uploaded crystal structure files. Through these contributions, CSLLM bridges the gap between theoretical material design and experimental synthesis, paving the way for the rapid discovery of novel and synthesizable functional materials.
Materials Science
What problem does this paper attempt to address?
This paper mainly discusses how to predict the synthesis possibility of crystal structures and their precursors. Existing methods, such as assessing thermodynamic or kinetic stability, often have gaps and cannot accurately predict whether synthesis is feasible. To address this issue, the research team developed a framework called Crystal Synthesis Large Language Models (CSLLM), which includes three large language models: Synthesizability LLM for predicting synthesis possibility, Methods LLM for classifying synthesis methods, and Precursors LLM for predicting precursors. The researchers selected a large number of experimentally validated synthesizable crystal structures from the database as positive samples and used machine learning models to screen out non-synthesizable negative samples, creating a balanced dataset. They proposed an efficient text representation method called "material string" to transform crystal structure information. After fine-tuning, the Synthesizability LLM achieved an accuracy of 98.6% on the test set, significantly outperforming traditional methods based on thermodynamic and kinetic stability. Meanwhile, the Methods LLM and Precursors LLM achieved accurate classification of synthesis methods and successful prediction of precursors, respectively. In addition, the paper introduces the enhancement of prediction reliability and practicality through reaction energy calculation and combination analysis. A user-friendly graphical interface was also developed to allow users to upload crystal structure files for synthesis possibility and precursor prediction. The CSLLM framework bridges the gap between theoretical material design and experimental synthesis, contributing to the rapid discovery of novel functional materials.