Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?

Zhilong Song,Shuaihua Lu,Minggang Ju,Qionghua Zhou,Jinlan Wang

2024-07-10

Abstract:Accessing the synthesizability of crystal structures is pivotal for advancing the practical application of theoretical material structures designed by machine learning or high-throughput screening. However, a significant gap exists between the actual synthesizability and thermodynamic or kinetic stability, which is commonly used for screening theoretical structures for experiments. To address this, we develop the Crystal Synthesis Large Language Models (CSLLM) framework, which includes three LLMs for predicting the synthesizability, synthesis methods, and precursors. We create a comprehensive synthesizability dataset including 140,120 crystal structures and develop an efficient text representation method for crystal structures to fine-tune the LLMs. The Synthesizability LLM achieves a remarkable 98.6% accuracy, significantly outperforming traditional synthesizability screening based on thermodynamic and kinetic stability by 106.1% and 44.5%, respectively. The Methods LLM achieves a classification accuracy of 91.02%, and the Precursors LLM has an 80.2% success rate in predicting synthesis precursors. Furthermore, we develop a user-friendly graphical interface that enables automatic predictions of synthesizability and precursors from uploaded crystal structure files. Through these contributions, CSLLM bridges the gap between theoretical material design and experimental synthesis, paving the way for the rapid discovery of novel and synthesizable functional materials.

Materials Science

What problem does this paper attempt to address?

This paper mainly discusses how to predict the synthesis possibility of crystal structures and their precursors. Existing methods, such as assessing thermodynamic or kinetic stability, often have gaps and cannot accurately predict whether synthesis is feasible. To address this issue, the research team developed a framework called Crystal Synthesis Large Language Models (CSLLM), which includes three large language models: Synthesizability LLM for predicting synthesis possibility, Methods LLM for classifying synthesis methods, and Precursors LLM for predicting precursors. The researchers selected a large number of experimentally validated synthesizable crystal structures from the database as positive samples and used machine learning models to screen out non-synthesizable negative samples, creating a balanced dataset. They proposed an efficient text representation method called "material string" to transform crystal structure information. After fine-tuning, the Synthesizability LLM achieved an accuracy of 98.6% on the test set, significantly outperforming traditional methods based on thermodynamic and kinetic stability. Meanwhile, the Methods LLM and Precursors LLM achieved accurate classification of synthesis methods and successful prediction of precursors, respectively. In addition, the paper introduces the enhancement of prediction reliability and practicality through reaction energy calculation and combination analysis. A user-friendly graphical interface was also developed to allow users to upload crystal structure files for synthesis possibility and precursor prediction. The CSLLM framework bridges the gap between theoretical material design and experimental synthesis, contributing to the rapid discovery of novel functional materials.

Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?

Explainable Synthesizability Prediction of Inorganic Crystal Polymorphs using Large Language Models

Explainable Synthesizability Prediction of Inorganic Crystal Structures using Large Language Models

Large Language Models for Inorganic Synthesis Predictions

Predicting Synthesizability using Machine Learning on Databases of Existing Inorganic Materials

Crystal Structure Generation with Autoregressive Large Language Modeling

Large Language Model-Guided Prediction Toward Quantum Materials Synthesis

Large Language Models as Molecular Design Engines

LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

CrysText: A Generative AI Approach for Text-Conditioned Crystal Structure Generation using LLM

Predicting the synthesizability of crystalline inorganic materials from the data of known material compositions

Comparison of LLMs in Extracting Synthesis Conditions and Generating Q&A Datasets for Metal-Organic Frameworks

SynAsk: Unleashing the Power of Large Language Models in Organic Synthesis

Benchmarking Large Language Models for Molecule Prediction Tasks

Self-Supervised Generative Models for Crystal Structures

A Robust Crystal Structure Prediction Method to Support Small Molecule Drug Development with Large Scale Validation and Prospective Studies

PLMC: Language Model of Protein Sequences Enhances Protein Crystallization Prediction

Can Large Language Models Empower Molecular Property Prediction?

Enhancing crystal structure prediction by combining computational and experimental data via graph networks

Generative Hierarchical Materials Search

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning