Efficient first principles based modeling via machine learning: from simple representations to high entropy materials

Kangming Li,Kamal Choudhary,Brian DeCost,Michael Greenwood,Jason Hattrick-Simpers
2024-03-23
Abstract:High-entropy materials (HEMs) have recently emerged as a significant category of materials, offering highly tunable properties. However, the scarcity of HEM data in existing density functional theory (DFT) databases, primarily due to computational expense, hinders the development of effective modeling strategies for computational materials discovery. In this study, we introduce an open DFT dataset of alloys and employ machine learning (ML) methods to investigate the material representations needed for HEM modeling. Utilizing high-throughput DFT calculations, we generate a comprehensive dataset of 84k structures, encompassing both ordered and disordered alloys across a spectrum of up to seven components and the entire compositional range. We apply descriptor-based models and graph neural networks to assess how material information is captured across diverse chemical-structural representations. We first evaluate the in-distribution performance of ML models to confirm their predictive accuracy. Subsequently, we demonstrate the capability of ML models to generalize between ordered and disordered structures, between low-order and high-order alloys, and between equimolar and non-equimolar compositions. Our findings suggest that ML models can generalize from cost-effective calculations of simpler systems to more complex scenarios. Additionally, we discuss the influence of dataset size and reveal that the information loss associated with the use of unrelaxed structures could significantly degrade the generalization performance. Overall, this research sheds light on several critical aspects of HEM modeling and offers insights for data-driven atomistic modeling of HEMs.
Materials Science
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the high computational cost and data scarcity in high - entropy materials (HEMs) modeling. Specifically: 1. **Data scarcity of high - entropy materials**: There is a lack of data on high - entropy materials in the existing density - functional - theory (DFT) databases, mainly because the computational cost of simulating chemical disorder is extremely high. This hinders the development of effective high - entropy - materials - modeling strategies. 2. **Generalization from simple systems to complex systems**: Traditional machine - learning (ML) models are usually trained based on ordered structures or special - quasi - random - structures (SQSs), but these models perform poorly when generalized to more complex high - entropy materials. Therefore, studying how to generalize from simple, low - cost - computational systems to more complex high - entropy materials is a key issue. 3. **Generalization ability for different compositions and components**: Existing research mainly focuses on high - entropy materials with equimolar compositions, and there is relatively little data for non - equimolar compositions. Therefore, exploring whether machine - learning models can be generalized from equimolar compositions to non - equimolar compositions is also an important issue. To solve these problems, the author has carried out the following work: - **Generate a large - scale DFT dataset**: A dataset containing 84,000 alloy structures was generated through high - throughput DFT calculations, covering ordered and disordered alloys with 2 to 7 components. - **Evaluate the performance of machine - learning models**: Descriptor models and graph neural networks were used to evaluate the capture of material information in different chemical - structural representations, and the generalization ability of the models was tested, including generalization from ordered structures to disordered structures, from low - order alloys to high - order alloys, and from equimolar compositions to non - equimolar compositions. - **Discuss the influence of dataset size and structural relaxation**: The influence of dataset size on model performance was analyzed, and the possible information loss caused by using unrelaxed structures was explored. In conclusion, this research aims to improve the efficiency and accuracy of high - entropy - materials modeling by generating a large - scale DFT dataset and combining machine - learning methods, thereby promoting the design and discovery of high - entropy materials.