Abstract:Machine learning materials properties measured by experiments is valuable yet difficult due to the limited amount of experimental data. In this work, we use a multi-fidelity random forest model to learn the experimental formation enthalpy of materials with prediction accuracy higher than the empirically corrected PBE functional (PBEfe) and meta-GGA functional (SCAN), and it outperforms the hotly studied deep neural-network based representation learning and transfer learning. We then use the model to calibrate the DFT formation enthalpy in the Materials Project database, and discover materials with underestimated stability. The multi-fidelity model is also used as a data-mining approach to find how DFT deviates from experiments by the explaining the model output.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to calibrate the material formation enthalpy (ΔHf) obtained from density functional theory (DFT) calculations through multi - fidelity machine learning methods, in order to improve its prediction accuracy. Specifically, the paper aims to: 1. **Improve the accuracy of formation enthalpy prediction**: Use a multi - fidelity random forest model to learn the experimentally measured material formation enthalpy, with a prediction accuracy higher than that of the empirically corrected PBE functional and meta - generalized gradient approximation (meta - GGA) functionals (such as PBEsol, SCAN, and r2SCAN), and better than the representation learning and transfer learning methods based on deep neural networks. 2. **Calibrate the DFT formation enthalpy in the Materials Project database**: Use the trained model to calibrate the DFT formation enthalpy in the Materials Project database and find materials with underestimated stability. 3. **Explain the deviation between DFT and experimental data**: By analyzing the model output, find the reasons for the deviation between DFT calculation results and experimental data. ### Main contributions of the paper - **Multi - fidelity machine learning model**: Proposed a multi - fidelity random forest model applied simultaneously at the feature and label levels, which significantly reduces the error in formation enthalpy prediction. - **Performance comparison**: Compared with existing density functionals and other machine learning methods, this model shows higher accuracy in predicting formation enthalpy. - **Application examples**: By re - evaluating the materials in the Materials Project database, materials with underestimated stability were found, and a calibrated formation enthalpy data set was provided. ### Key technologies - **Multi - fidelity machine learning**: Combine low - fidelity DFT data and high - fidelity experimental data, and improve prediction accuracy by learning the differences between them. - **Random forest model**: Use DFT data as input simultaneously at the feature and label levels, which improves the prediction performance of the model. - **Data set construction**: Integrate the IIT and SSUB data sets, and after deduplication, obtain 1,143 data points with experimental formation enthalpy. ### Conclusion The paper successfully improves the prediction accuracy of material formation enthalpy through the multi - fidelity random forest model, corrects the DFT calculation results in the Materials Project database, and finds materials with underestimated stability. This method not only improves the accuracy of formation enthalpy prediction, but also provides new ideas for machine learning applications in the field of materials science.

Calibrating DFT formation enthalpy calculations by multi-fidelity machine learning

Calibrating DFT Formation Enthalpy Calculations by Multifidelity Machine Learning

Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning

Interpretable machine learning to understand the performance of semi local density functionals for materials thermochemistry

Deepks: A Comprehensive Data-Driven Approach Toward Chemically Accurate Density Functional Theory

Leveraging Large-scale Computational Database and Deep Learning for Accurate Prediction of Material Properties

Accurate Prediction of Enthalpies of Formation for a Large Set of Organic Compounds.

Learning properties of ordered and disordered materials from multi-fidelity data

Moving closer to experimental level materials property prediction using AI

Machine learning formation enthalpies of intermetallics

$Δ$-Machine Learning to Elevate DFT-based Potentials and a Force Field to the CCSD(T) Level Illustrated for Ethanol

A critical examination of compound stability predictions from machine-learned formation energies

Δ-Machine Learning to Elevate DFT-Based Potentials and a Force Field to the CCSD( T ) Level Illustrated for Ethanol

Materials discovery through machine learning formation energy

Improving Density Functional Prediction of Molecular Thermochemical Properties with a Machine-Learning-Corrected Generalized Gradient Approximation.

Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models

Efficient first principles based modeling via machine learning: from simple representations to high entropy materials

Machine Learning Corrections for DFT Noncovalent Interactions

Large-scale machine-learning-assisted exploration of the whole materials space

Multifidelity Information Fusion with Machine Learning: A Case Study of Dopant Formation Energies in Hafnia

A deep learning framework to emulate density functional theory