Feature-Assisted Machine Learning for Predicting Band Gaps of Binary Semiconductors

Sitong Huo,Shuqing Zhang,Qilin Wu,Xinping Zhang
DOI: https://doi.org/10.3390/nano14050445
IF: 5.3
2024-02-28
Nanomaterials
Abstract:The band gap is a key parameter in semiconductor materials that is essential for advancing optoelectronic device development. Accurately predicting band gaps of materials at low cost is a significant challenge in materials science. Although many machine learning (ML) models for band gap prediction already exist, they often suffer from low interpretability and lack theoretical support from a physical perspective. In this study, we address these challenges by using a combination of traditional ML algorithms and the ‘white-box’ sure independence screening and sparsifying operator (SISSO) approach. Specifically, we enhance the interpretability and accuracy of band gap predictions for binary semiconductors by integrating the importance rankings of support vector regression (SVR), random forests (RF), and gradient boosting decision trees (GBDT) with SISSO models. Our model uses only the intrinsic features of the constituent elements and their band gaps calculated using the Perdew–Burke–Ernzerhof method, significantly reducing computational demands. We have applied our model to predict the band gaps of 1208 theoretically stable binary compounds. Importantly, the model highlights the critical role of electronegativity in determining material band gaps. This insight not only enriches our understanding of the physical principles underlying band gap prediction but also underscores the potential of our approach in guiding the synthesis of new and valuable semiconductor materials.
materials science, multidisciplinary,physics, applied,nanoscience & nanotechnology,chemistry
What problem does this paper attempt to address?
The paper attempts to address the problem of accurately predicting the band gap of binary semiconductor materials with low computational cost. Although there are currently many machine learning models for predicting band gaps, these models often suffer from poor interpretability and lack of physical theory support. This paper aims to improve the accuracy and interpretability of band gap predictions by combining traditional machine learning algorithms with "white-box" methods (such as SISSO). Specifically, the authors propose a feature-assisted machine learning method that utilizes the importance rankings of algorithms such as Support Vector Regression (SVR), Random Forest (RF), and Gradient Boosting Decision Trees (GBDT), in combination with the SISSO model, to predict the band gap of binary semiconductor materials. This method not only improves the accuracy of predictions but also reveals the key role of electronegativity in determining the material's band gap, thereby enriching the understanding of the physical principles behind band gap predictions.