The Application of Supervised Learning Algorithms in Predicting the Formation Energy of NLO Crystals

Ai Wang,Yaohui Yin,Zhixin Sun,Guangyong Jin,Chao Xin
DOI: https://doi.org/10.1002/adts.202400048
2024-05-30
Advanced Theory and Simulations
Abstract:The research focuses on predicting formation energy of nonlinear optical crystals for enhanced stability. Using compositional data, regression models are build, with gradient boosting regression showing exceptional performance (R2: 0.935, RMSE: 0.248 eV per atom). SHapley Additive exPlanations analysis provided insights into feature contributions. Validation through first‐principles calculations for GaP, ZnGeP2, and CdSiP2 confirmed model accuracy (error range: ≈0.1 eV per atom). Nonlinear optical crystals (NLO) are a key class of functional materials in the field of laser technology due to their excellent frequency conversion effects and physical–chemical stability. The research aims to find NLO crystals with superior stability by predicting their formation energy. In this study, only compositional information is utilized as input features and models are constructed using regression algorithms such as Random Forest Regression (RFR), Support Vector Regression (SVR), and Gradient Boosting Regression (GBR). Notably, the GBR model exhibited outstanding predictive performance, with an R2 value of 0.935 and root mean square error (RMSE) of 0.248 eV per atom. Additionally, SHapley Additive exPlanations (SHAP) analysis is employed to elucidate the fundamental principles behind the predictions by assessing the contribution of each feature to the formation energy. To validate the reliability of the models, first‐principles calculations are conducted to predict the formation energy of materials of GaP, ZnGeP2, and CdSiP2. The error range between the model predictions and the Generalized Gradient Approximation (GGA) calculated values is ≈0.1 eV per atom, confirming the accuracy of the models.
multidisciplinary sciences
What problem does this paper attempt to address?