Assessment of Fine-Tuned Large Language Models for Real-World Chemistry and Material Science Applications

Berend Smit,Joren van Herck,Maria Victoria Gil,Kevin Maik Jablonka,Alex Abrudan,Andy Anker,Mehrdad Asgari,Ben Blaiszik,Leander Choudhury,Clemence Corminboeuf,Hilal Daglar,Ian T. Foster,Susana Garcia,Matthew Garvin,Guillaume Godin,Lydia L. Good,Jianan Gu,Noemie Xiao Hu,Xin Jin,Tanja Junkers,Seda Keskin,Tuomas P.J. Knowles,Ruben Laplaza,Sauradeep Majumdar,Hossein Mashhadimoslem,Ruaraidh D. McIntosh,Seyed Mohamad Moosavi,Beatriz Mourino,Francesca Nerli,Covadonga Pevida,Neda Poudineh,Mahyar Rajabi-Kochi,Kadi L. Saar,Fahimeh H. Saboor,Morteza Sagharichiha,KJ Schmidt,Jiale Shi,Dennis Svatunek,Marco Taddei,Igor Tetko,Domonkos Tolnai,Sahar Vahdatifar,Jonathan Whitmer,Florian Wieland,Regine Willumeit-Romer,Andreas Zuttel
DOI: https://doi.org/10.26434/chemrxiv-2024-mm31v
2024-07-31
Abstract:The current generation of large language models (LLMs), like ChatGPT, have limited chemical knowledge. Recently, it has been shown that these LLMs can learn and predict chemical properties through fine-tuning. In this work, we explore the potential and limitations of this approach. We studied the performance of fine-tuning GPT-J-6B, a public-domain version of the GPT family, for a range of different chemical questions. We find that in most, if not all, cases, this approach outperforms the benchmark (random guessing) for a simple classification problem. Depending on the size of the dataset and the type of questions, we can also address more sophisticated problems. The most important conclusions of this work are that, for all datasets considered, their conversion into an LLM fine-tuning training set is straightforward and that fine-tuning with even relatively small datasets leads to predictive models. These results suggest that the systematic use of LLMs to guide experiments and simulations will be a powerful technique in any research study, significantly reducing unnecessary experiments or computations.
Chemistry
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to evaluate and explore the potential and limitations of large - language models (LLMs) in real - world chemistry and materials science applications. Specifically, the authors attempt to answer the following core questions: 1. **Limited chemical knowledge of existing LLMs**: The current generation of large - language models (such as ChatGPT) has limited knowledge in the field of chemistry. The authors hope to fine - tune these models so that they can better understand and predict chemical properties. 2. **Effect of fine - tuning LLMs**: Researchers want to verify whether fine - tuning LLMs can significantly improve their performance on chemistry and materials science problems. They selected the open - source model GPT - J - 6B and conducted fine - tuning experiments on different datasets and tasks. 3. **Performance in practical applications**: The paper explores the performance of fine - tuned LLMs when dealing with practical chemistry and materials science problems, such as predicting the adhesion energy of polymers, the glass - transition temperature of monomers, and the melting point of small molecules. In addition, it also involves issues such as the protein - phase - separation tendency, the microstructural characteristics of magnesium alloys, and the structural types of nanoparticles. 4. **Simplifying experimental design and calculation**: By using fine - tuned LLMs, researchers hope to reduce unnecessary experiments or calculations, thereby more efficiently guiding experimental and simulation studies. This will help accelerate the research and development process of new materials and new drugs. 5. **Advantages of natural - language input**: Compared with traditional machine - learning methods, LLMs can directly accept natural language as input, enabling researchers to interact more conveniently with data and tools without the need for complex feature engineering. ### Main conclusions - **Simple dataset conversion**: All considered datasets can be relatively easily converted into training sets for LLM fine - tuning. - **Small datasets can also be effective**: Even when fine - tuning with relatively small datasets, models with predictive ability can be obtained. - **Wide applicability**: Fine - tuned LLMs can perform well on a variety of chemistry and materials science problems, including classification, regression, and inverse - design problems. - **Significantly better than random guessing**: In most cases, the performance of the fine - tuned model is significantly better than the random - guessing benchmark. Through these studies, the authors show that systematically using LLMs to guide experiments and simulations will become a powerful technique in any research, which can significantly reduce unnecessary experiments or calculations.