Abstract:High-entropy materials (HEMs) have recently emerged as a significant category of materials, offering highly tunable properties. However, the scarcity of HEM data in existing density functional theory (DFT) databases, primarily due to computational expense, hinders the development of effective modeling strategies for computational materials discovery. In this study, we introduce an open DFT dataset of alloys and employ machine learning (ML) methods to investigate the material representations needed for HEM modeling. Utilizing high-throughput DFT calculations, we generate a comprehensive dataset of 84k structures, encompassing both ordered and disordered alloys across a spectrum of up to seven components and the entire compositional range. We apply descriptor-based models and graph neural networks to assess how material information is captured across diverse chemical-structural representations. We first evaluate the in-distribution performance of ML models to confirm their predictive accuracy. Subsequently, we demonstrate the capability of ML models to generalize between ordered and disordered structures, between low-order and high-order alloys, and between equimolar and non-equimolar compositions. Our findings suggest that ML models can generalize from cost-effective calculations of simpler systems to more complex scenarios. Additionally, we discuss the influence of dataset size and reveal that the information loss associated with the use of unrelaxed structures could significantly degrade the generalization performance. Overall, this research sheds light on several critical aspects of HEM modeling and offers insights for data-driven atomistic modeling of HEMs.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are the high computational cost and data scarcity in high - entropy materials (HEMs) modeling. Specifically: 1. **Data scarcity of high - entropy materials**: There is a lack of data on high - entropy materials in the existing density - functional - theory (DFT) databases, mainly because the computational cost of simulating chemical disorder is extremely high. This hinders the development of effective high - entropy - materials - modeling strategies. 2. **Generalization from simple systems to complex systems**: Traditional machine - learning (ML) models are usually trained based on ordered structures or special - quasi - random - structures (SQSs), but these models perform poorly when generalized to more complex high - entropy materials. Therefore, studying how to generalize from simple, low - cost - computational systems to more complex high - entropy materials is a key issue. 3. **Generalization ability for different compositions and components**: Existing research mainly focuses on high - entropy materials with equimolar compositions, and there is relatively little data for non - equimolar compositions. Therefore, exploring whether machine - learning models can be generalized from equimolar compositions to non - equimolar compositions is also an important issue. To solve these problems, the author has carried out the following work: - **Generate a large - scale DFT dataset**: A dataset containing 84,000 alloy structures was generated through high - throughput DFT calculations, covering ordered and disordered alloys with 2 to 7 components. - **Evaluate the performance of machine - learning models**: Descriptor models and graph neural networks were used to evaluate the capture of material information in different chemical - structural representations, and the generalization ability of the models was tested, including generalization from ordered structures to disordered structures, from low - order alloys to high - order alloys, and from equimolar compositions to non - equimolar compositions. - **Discuss the influence of dataset size and structural relaxation**: The influence of dataset size on model performance was analyzed, and the possible information loss caused by using unrelaxed structures was explored. In conclusion, this research aims to improve the efficiency and accuracy of high - entropy - materials modeling by generating a large - scale DFT dataset and combining machine - learning methods, thereby promoting the design and discovery of high - entropy materials.

Efficient first principles based modeling via machine learning: from simple representations to high entropy materials

Using Machine Learning and Feature Engineering to Characterize Limited Material Datasets of High-Entropy Alloys

Machine learning for high-entropy alloys: Progress, challenges and opportunities

Recent machine learning-driven investigations into high entropy alloys: A comprehensive review

Machine Learning Paves the Way for High Entropy Compounds Exploration: Challenges, Progress, and Outlook

Machine Learning Design for High-Entropy Alloys: Models and Algorithms

Efficient first principles based modeling via machine learning: from simple representations to high entropy materials

Predictive Modeling of High-Entropy Alloys and Amorphous Metallic Alloys Using Machine Learning

Descriptor-Enabled Rational Design of High-Entropy Materials Over Vast Chemical Spaces

Interpretable Machine Learning for High-Strength High-Entropy Alloy Design

Development of machine learning based models for design of high entropy alloys

Composition design of high-entropy alloys with deep sets learning

High-Entropy Materials Design by Integrating the First-Principles Calculations and Machine Learning: A Case Study in the Al-Co-Cr-Fe-Ni System

Phase prediction in high entropy alloys with a rational selection of materials descriptors and machine learning models

Electronic structure prediction of medium and high entropy alloys across composition space

Machine Learning-Based Classification, Interpretation, and Prediction of High-Entropy-Alloy Intermetallic Phases

Machine Learning Assisted Design of High Entropy Alloys with Desired Property

Microstructures and Properties of High‐Entropy Materials: Modeling, Simulation, and Experiments

Putting Density Functional Theory to the Test in Machine-Learning-Accelerated Materials Discovery

Machine Learning and Data Analytics for Design and Manufacturing of High-Entropy Materials Exhibiting Mechanical or Fatigue Properties of Interest

Predicting and Understanding the Ductility of BCC High Entropy Alloys Via Knowledge-Integrated Machine Learning