Abstract:For over a decade, machine learning (ML) models have been making strides in computer vision and natural language processing (NLP), demonstrating high proficiency in specialized tasks. The emergence of large-scale language and generative image models, such as ChatGPT and Stable Diffusion, has significantly broadened the accessibility and application scope of these technologies. Traditional predictive models are typically constrained to mapping input data to numerical values or predefined categories, limiting their usefulness beyond their designated tasks. In contrast, contemporary models employ representation learning and generative modeling, enabling them to extract and encode key insights from a wide variety of data sources and decode them to create novel responses for desired goals. They can interpret queries phrased in natural language to deduce the intended output. In parallel, the application of ML techniques in materials science has advanced considerably, particularly in areas like inverse design, material prediction, and atomic modeling. Despite these advancements, the current models are overly specialized, hindering their potential to supplant established industrial processes. Materials science, therefore, necessitates the creation of a comprehensive, versatile model capable of interpreting human-readable inputs, intuiting a wide range of possible search directions, and delivering precise solutions. To realize such a model, the field must adopt cutting-edge representation, generative, and foundation model techniques tailored to materials science. A pivotal component in this endeavor is the establishment of an extensive, centralized dataset encompassing a broad spectrum of research topics. This dataset could be assembled by crowdsourcing global research contributions and developing models to extract data from existing literature and represent them in a homogenous format. A massive dataset can be used to train a central model that learns the underlying physics of the target areas, which can then be connected to a variety of specialized downstream tasks. Ultimately, the envisioned model would empower users to intuitively pose queries for a wide array of desired outcomes. It would facilitate the search for existing data that closely matches the sought-after solutions and leverage its understanding of physics and material-behavior relationships to innovate new solutions when pre-existing ones fall short.

A Prompt-Engineered Large Language Model, Deep Learning Workflow for Materials Classification

LLMatDesign: Autonomous Materials Discovery with Large Language Models

Beyond designer's knowledge: Generating materials design hypotheses via large language models

Materials science in the era of large language models: a perspective

Evaluating Large Language Models for Material Selection

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

Exploring large language models for microstructure evolution in materials

From Text to Insight: Large Language Models for Materials Science Data Extraction

Large Language Models for Material Property Predictions: elastic constant tensor prediction and materials design

Are LLMs Ready for Real-World Materials Discovery?

Polymetis:Large Language Modeling for Multiple Material Domains

Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions

LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation

Regression with Large Language Models for Materials and Molecular Property Prediction

Enhancing Large Language Model Comprehension of Material Phase Diagrams through Prompt Engineering and Benchmark Datasets

BioinspiredLLM: Conversational Large Language Model for the Mechanics of Biological and Bio-inspired Materials

Advancing materials science through next-generation machine learning

MatExpert: Decomposing Materials Discovery by Mimicking Human Experts

Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering