Abstract:For over a decade, machine learning (ML) models have been making strides in computer vision and natural language processing (NLP), demonstrating high proficiency in specialized tasks. The emergence of large-scale language and generative image models, such as ChatGPT and Stable Diffusion, has significantly broadened the accessibility and application scope of these technologies. Traditional predictive models are typically constrained to mapping input data to numerical values or predefined categories, limiting their usefulness beyond their designated tasks. In contrast, contemporary models employ representation learning and generative modeling, enabling them to extract and encode key insights from a wide variety of data sources and decode them to create novel responses for desired goals. They can interpret queries phrased in natural language to deduce the intended output. In parallel, the application of ML techniques in materials science has advanced considerably, particularly in areas like inverse design, material prediction, and atomic modeling. Despite these advancements, the current models are overly specialized, hindering their potential to supplant established industrial processes. Materials science, therefore, necessitates the creation of a comprehensive, versatile model capable of interpreting human-readable inputs, intuiting a wide range of possible search directions, and delivering precise solutions. To realize such a model, the field must adopt cutting-edge representation, generative, and foundation model techniques tailored to materials science. A pivotal component in this endeavor is the establishment of an extensive, centralized dataset encompassing a broad spectrum of research topics. This dataset could be assembled by crowdsourcing global research contributions and developing models to extract data from existing literature and represent them in a homogenous format. A massive dataset can be used to train a central model that learns the underlying physics of the target areas, which can then be connected to a variety of specialized downstream tasks. Ultimately, the envisioned model would empower users to intuitively pose queries for a wide array of desired outcomes. It would facilitate the search for existing data that closely matches the sought-after solutions and leverage its understanding of physics and material-behavior relationships to innovate new solutions when pre-existing ones fall short.

MaScQA: Investigating Materials Science Knowledge of Large Language Models

MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models

Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions

Mining experimental data from Materials Science literature with Large Language Models: an evaluation study

MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

Polymetis:Large Language Modeling for Multiple Material Domains

SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset

Towards Development of Automated Knowledge Maps and Databases for Materials Engineering using Large Language Models

Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models

Materials science in the era of large language models: a perspective

Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding

Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG) Using LLM

14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

From Text to Insight: Large Language Models for Materials Science Data Extraction

Ontology-conformal recognition of materials entities using language models

Evaluation of Open-Source Large Language Models for Metal-Organic Frameworks Research

Are LLMs Ready for Real-World Materials Discovery?

Advancing materials science through next-generation machine learning