AlloyBERT: Alloy Property Prediction with Large Language Models

Akshat Chaudhari,Chakradhar Guntuboina,Hongshuo Huang,Amir Barati Farimani

2024-03-29

Abstract:The pursuit of novel alloys tailored to specific requirements poses significant challenges for researchers in the field. This underscores the importance of developing predictive techniques for essential physical properties of alloys based on their chemical composition and processing parameters. This study introduces AlloyBERT, a transformer encoder-based model designed to predict properties such as elastic modulus and yield strength of alloys using textual inputs. Leveraging the pre-trained RoBERTa encoder model as its foundation, AlloyBERT employs self-attention mechanisms to establish meaningful relationships between words, enabling it to interpret human-readable input and predict target alloy properties. By combining a tokenizer trained on our textual data and a RoBERTa encoder pre-trained and fine-tuned for this specific task, we achieved a mean squared error (MSE) of 0.00015 on the Multi Principal Elemental Alloys (MPEA) data set and 0.00611 on the Refractory Alloy Yield Strength (RAYS) dataset. This surpasses the performance of shallow models, which achieved a best-case MSE of 0.00025 and 0.0076 on the MPEA and RAYS datasets respectively. Our results highlight the potential of language models in material science and establish a foundational framework for text-based prediction of alloy properties that does not rely on complex underlying representations, calculations, or simulations.

Materials Science,Machine Learning

What problem does this paper attempt to address?

The paper "AlloyBERT: Predicting Alloy Properties with Large-Scale Language Models" aims to address an important issue in alloy materials science, which is how to predict the physical properties of alloys, such as elastic modulus and yield strength, quickly and accurately based on their chemical composition and processing parameters. Traditional experimental methods are inefficient due to the diversity of alloy combinations and computational complexity. The paper proposes a novel model called AlloyBERT, which is based on a Transformer encoder and utilizes the pre-trained RoBERTa model as a foundation to understand and establish meaningful relationships between words in the input text through self-attention mechanism. This approach allows the model to interpret human-readable inputs and predict alloy properties. By training on specialized text datasets and fine-tuning the RoBERTa model, AlloyBERT achieves mean squared errors (MSE) of 0.00015 and 0.00611 on the Multi Principal Elemental Alloys (MPEA) dataset and the Refractory Alloy Yield Strength (RAYS) dataset respectively, surpassing the best performance of shallow models. The paper also explores the impact of different types of text inputs on model performance and finds that the most detailed input format, which includes atomic information and physical properties of elements, combined with pre-training and fine-tuning, significantly improves prediction accuracy. Moreover, AlloyBERT's high R² scores (0.99 for MPEA dataset and 0.83 for RAYS dataset) demonstrate its strong predictive capability. Overall, the paper demonstrates the potential of language models in materials science, particularly in the field of alloy property prediction, providing a foundation framework for text-based prediction methods and alleviating reliance on complex computations and simulations.

AlloyBERT: Alloy Property Prediction with Large Language Models

High Entropy Alloy property predictions using Transformer-based language model

A primitive machine learning tool for the mechanical property prediction of multiple principal element alloys

Interpretable Machine Learning for High-Strength High-Entropy Alloy Design

General-purpose Machine-Learned Potential for 16 Elemental Metals and Their Alloys

Enhancing corrosion-resistant alloy design through natural language processing and deep learning

Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models

PeptideBERT: A Language Model based on Transformers for Peptide Property Prediction

Towards the holistic design of alloys with large language models

From Tokens to Materials: Leveraging Language Models for Scientific Discovery

Machine learning-aided phase and mechanical properties prediction in multi-principal element alloys

Predictive Modeling of High-Entropy Alloys and Amorphous Metallic Alloys Using Machine Learning

Physics-informed machine learning for composition-process-property alloy design: shape memory alloy demonstration

From Small Data Modeling to Large Language Model Screening: A Dual‐Strategy Framework for Materials Intelligent Design

AtomAgents: Alloy design and discovery through physics-aware multi-modal multi-agent artificial intelligence

An interpretable boosting-based predictive model for transformation temperatures of shape memory alloys

Predicting Yield Strength and Plastic Elongation in Body-Centered Cubic High-Entropy Alloys

Machine Learning for Alloy Composition and Process Optimization

A feasibility study of machine learning-assisted alloy design using wrought aluminum alloys as an example

Integrating machine learning with mechanistic models for predicting the yield strength of high entropy alloys

Large Language Models for Material Property Predictions: elastic constant tensor prediction and materials design