Calibrating DFT formation enthalpy calculations by multi-fidelity machine learning

Sheng Gong,Shuo Wang,Tian Xie,Woo Hyun Chae,Runze Liu,Jeffrey C. Grossman
DOI: https://doi.org/10.1021/jacsau.2c00235
2022-03-22
Abstract:Machine learning materials properties measured by experiments is valuable yet difficult due to the limited amount of experimental data. In this work, we use a multi-fidelity random forest model to learn the experimental formation enthalpy of materials with prediction accuracy higher than the empirically corrected PBE functional (PBEfe) and meta-GGA functional (SCAN), and it outperforms the hotly studied deep neural-network based representation learning and transfer learning. We then use the model to calibrate the DFT formation enthalpy in the Materials Project database, and discover materials with underestimated stability. The multi-fidelity model is also used as a data-mining approach to find how DFT deviates from experiments by the explaining the model output.
Materials Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to calibrate the material formation enthalpy (ΔHf) obtained from density functional theory (DFT) calculations through multi - fidelity machine learning methods, in order to improve its prediction accuracy. Specifically, the paper aims to: 1. **Improve the accuracy of formation enthalpy prediction**: Use a multi - fidelity random forest model to learn the experimentally measured material formation enthalpy, with a prediction accuracy higher than that of the empirically corrected PBE functional and meta - generalized gradient approximation (meta - GGA) functionals (such as PBEsol, SCAN, and r2SCAN), and better than the representation learning and transfer learning methods based on deep neural networks. 2. **Calibrate the DFT formation enthalpy in the Materials Project database**: Use the trained model to calibrate the DFT formation enthalpy in the Materials Project database and find materials with underestimated stability. 3. **Explain the deviation between DFT and experimental data**: By analyzing the model output, find the reasons for the deviation between DFT calculation results and experimental data. ### Main contributions of the paper - **Multi - fidelity machine learning model**: Proposed a multi - fidelity random forest model applied simultaneously at the feature and label levels, which significantly reduces the error in formation enthalpy prediction. - **Performance comparison**: Compared with existing density functionals and other machine learning methods, this model shows higher accuracy in predicting formation enthalpy. - **Application examples**: By re - evaluating the materials in the Materials Project database, materials with underestimated stability were found, and a calibrated formation enthalpy data set was provided. ### Key technologies - **Multi - fidelity machine learning**: Combine low - fidelity DFT data and high - fidelity experimental data, and improve prediction accuracy by learning the differences between them. - **Random forest model**: Use DFT data as input simultaneously at the feature and label levels, which improves the prediction performance of the model. - **Data set construction**: Integrate the IIT and SSUB data sets, and after deduplication, obtain 1,143 data points with experimental formation enthalpy. ### Conclusion The paper successfully improves the prediction accuracy of material formation enthalpy through the multi - fidelity random forest model, corrects the DFT calculation results in the Materials Project database, and finds materials with underestimated stability. This method not only improves the accuracy of formation enthalpy prediction, but also provides new ideas for machine learning applications in the field of materials science.