Abstract:This study is dedicated to assessing the capabilities of large language models (LLMs) such as GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo in extracting structured information from scientific documents in materials science. To this end, we primarily focus on two critical tasks of information extraction: (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation extraction (RE) between these entities. Due to the evident lack of datasets within Materials Informatics (MI), we evaluated using SuperMat, based on superconductor research, and MeasEval, a generic measurement evaluation corpus. The performance of LLMs in executing these tasks is benchmarked against traditional models based on the BERT architecture and rule-based approaches (baseline). We introduce a novel methodology for the comparative analysis of intricate material expressions, emphasising the standardisation of chemical formulas to tackle the complexities inherent in materials science information assessment. For NER, LLMs fail to outperform the baseline with zero-shot prompting and exhibit only limited improvement with few-shot prompting. However, a GPT-3.5-Turbo fine-tuned with the appropriate strategy for RE outperforms all models, including the baseline. Without any fine-tuning, GPT-4 and GPT-4-Turbo display remarkable reasoning and relationship extraction capabilities after being provided with merely a couple of examples, surpassing the baseline. Overall, the results suggest that although LLMs demonstrate relevant reasoning skills in connecting concepts, specialised models are currently a better choice for tasks requiring extracting complex domain-specific entities like materials. These insights provide initial guidance applicable to other materials science sub-domains in future work.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the ability of large language models (LLMs) to extract structured information from materials science literature. Specifically, the paper focuses on two key tasks: 1. **Named Entity Recognition (NER)**: Identify named entities of research materials and physical properties. 2. **Relation Extraction (RE)**: Extract the relationships between these entities. Due to the lack of sufficient datasets in the field of materials science, the authors used SuperMat (a dataset based on superconductor research) and MeasEval (a general measurement evaluation corpus) for evaluation. By comparing the performance of LLMs with traditional BERT - based models and rule - based methods (baseline), the paper explored the performance of LLMs in handling extraction tasks of complex domain - specific entities such as materials. ### Main Problems 1. **How to effectively use LLMs to extract relevant information from materials science literature?** - The paper answers this question by evaluating the performance of LLMs in the named entity recognition (NER) task. The experimental results show that under zero - shot prompting, the performance of LLMs is inferior to the baseline; but under few - shot prompting, after appropriate strategic fine - tuning, GPT - 3.5 - Turbo's performance in the relation extraction (RE) task is better than all models, including the baseline. 2. **What is the reasoning ability of LLMs when connecting complex concepts?** - This question is answered by evaluating the performance of LLMs in the relation extraction (RE) task. The results show that without fine - tuning, GPT - 4 and GPT - 4 - Turbo can demonstrate significant relation reasoning and extraction abilities with just a few examples, surpassing the baseline. ### Research Contributions - Designed and ran a benchmark test to evaluate the performance of LLMs in the named entity recognition task of materials and properties. - Evaluated the performance of LLMs in the relation extraction task in the context of materials science. - Proposed a novel evaluation method - formula matching, which evaluates the extraction accuracy of material entities through element - alignment comparison. Overall, this research provides preliminary guidance for future information extraction in the field of materials science and points out the potential and limitations of LLMs in handling complex domain - specific entities.

Mining experimental data from Materials Science literature with Large Language Models: an evaluation study

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

MaScQA: Investigating Materials Science Knowledge of Large Language Models

Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models

From Text to Insight: Large Language Models for Materials Science Data Extraction

Materials science in the era of large language models: a perspective

Are LLMs Ready for Real-World Materials Discovery?

Towards Development of Automated Knowledge Maps and Databases for Materials Engineering using Large Language Models

Exploring the Potential of Large Language Models in Molecular Tasks: An Insightful Evaluation with GPT‐4

Evaluation of Open-Source Large Language Models for Metal-Organic Frameworks Research

NLP meets Materials Science: Quantifying the presentation of materials data in scientific literature

LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction

Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions

Polymetis:Large Language Modeling for Multiple Material Domains

MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models

Comparative Study of Large Language Model Architectures on Frontier

From Tokens to Materials: Leveraging Language Models for Scientific Discovery

Fine-tuning Large Language Models for Chemical Text Mining

Extracting accurate materials data from research papers with conversational language models and prompt engineering