Assessment of Fine-Tuned Large Language Models for Real-World Chemistry and Material Science Applications
Berend Smit,Joren van Herck,Maria Victoria Gil,Kevin Maik Jablonka,Alex Abrudan,Andy Anker,Mehrdad Asgari,Ben Blaiszik,Leander Choudhury,Clemence Corminboeuf,Hilal Daglar,Ian T. Foster,Susana Garcia,Matthew Garvin,Guillaume Godin,Lydia L. Good,Jianan Gu,Noemie Xiao Hu,Xin Jin,Tanja Junkers,Seda Keskin,Tuomas P.J. Knowles,Ruben Laplaza,Sauradeep Majumdar,Hossein Mashhadimoslem,Ruaraidh D. McIntosh,Seyed Mohamad Moosavi,Beatriz Mourino,Francesca Nerli,Covadonga Pevida,Neda Poudineh,Mahyar Rajabi-Kochi,Kadi L. Saar,Fahimeh H. Saboor,Morteza Sagharichiha,KJ Schmidt,Jiale Shi,Dennis Svatunek,Marco Taddei,Igor Tetko,Domonkos Tolnai,Sahar Vahdatifar,Jonathan Whitmer,Florian Wieland,Regine Willumeit-Romer,Andreas Zuttel
DOI: https://doi.org/10.26434/chemrxiv-2024-mm31v
2024-07-31
Abstract:The current generation of large language models (LLMs), like ChatGPT, have limited chemical knowledge. Recently, it has been shown that these LLMs can learn and predict chemical properties through fine-tuning. In this work, we explore the potential and limitations of this approach. We studied the performance of fine-tuning GPT-J-6B, a public-domain version of the GPT family, for a range of different chemical questions. We find that in most, if not all, cases, this approach outperforms the benchmark (random guessing) for a simple classification problem. Depending on the size of the dataset and the type of questions, we can also address more sophisticated problems. The most important conclusions of this work are that, for all datasets considered, their conversion into an LLM fine-tuning training set is straightforward and that fine-tuning with even relatively small datasets leads to predictive models. These results suggest that the systematic use of LLMs to guide experiments and simulations will be a powerful technique in any research study, significantly reducing unnecessary experiments or computations.
Chemistry