Abstract:We evaluate the effectiveness of pre-trained and fine-tuned large language models (LLMs) for predicting the synthesizability of inorganic compounds and the selection of precursors needed to perform inorganic synthesis. The predictions of fine-tuned LLMs are comparable to—and sometimes better than—recent bespoke machine learning models for these tasks, but require only minimal user expertise, cost, and time to develop. Therefore, this strategy can serve both as an effective and strong baseline for future machine learning studies of various chemical applications and as a practical tool for experimental chemists.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to use large language models (LLMs) to predict the synthesis possibility of inorganic compounds and select the required precursors. Specifically, the researchers evaluated the performance of pre - trained and fine - tuned large language models in the following two tasks: 1. **Synthesis Possibility Prediction**: Given a chemical formula, predict whether the compound can be synthesized. This is a Positive - Unlabeled (PU) learning problem because the available data set contains known (previously synthesized) compounds and unknown (hypothetical) compounds, and the latter may not be synthesizable. The researchers used data from the Materials Project and the Open Quantum Materials Database to define the possibility set, which contains 393,053 unique inorganic compositions, of which 40,817 compounds have references in the Inorganic Crystal Structure Database (ICSD) and are regarded as positive samples (synthesized), and the remaining 352,236 are regarded as unlabeled samples (hypothetical). 2. **Precursor Selection**: Given the chemical formula of the target compound, predict all the precursors required to synthesize the compound. The output must exactly match the set of precursors in the known synthesis examples because the output is restricted to a predefined precursor list, which is a multi - label prediction problem. The researchers started from the text - oriented synthesis data set of Kononova et al., removed inconsistent or incomplete data, and retained reactions that only contained precursors used in ≥5 example reactions, finally obtaining 11,923 unique reactions and 311 precursors. The researchers used GPT - 3.5 and GPT - 4 as the base models and fine - tuned these models to improve their performance on these two tasks. The results show that the fine - tuned LLMs perform comparably to, and sometimes even better than, the recently developed machine - learning models specifically for such tasks. In addition, this method is simple and low - cost and can be used as a strong baseline method for future machine - learning research, and also provides a practical tool for experimental chemists.

Large Language Models for Inorganic Synthesis Predictions

Explainable Synthesizability Prediction of Inorganic Crystal Polymorphs using Large Language Models

Large Language Model-Guided Prediction Toward Quantum Materials Synthesis

Explainable Synthesizability Prediction of Inorganic Crystal Structures using Large Language Models

Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?

Large Language Models as Molecular Design Engines

Adapting Language Models for Retrosynthesis Prediction

SynAsk: Unleashing the Power of Large Language Models in Organic Synthesis

Fine-tuning Large Language Models for Chemical Text Mining

LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

Extracting Structured Data from Organic Synthesis Procedures Using a Fine-Tuned Large Language Model

Leveraging large language models for predictive chemistry

From Words to Molecules: A Survey of Large Language Models in Chemistry

Are large language models superhuman chemists?

Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis.

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

Inorganic synthesis recommendation by machine learning materials similarity from scientific literature

LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

A Review of Large Language Models and Autonomous Agents in Chemistry