Predicting High‐Performance Thermoelectric Materials With StarryData2

Nuttawat Parse,Jose Recatala‐Gomez,Ruiming Zhu,Andre KY Low,Kedar Hippalgaonkar,Tomoya Mato,Yukari Katsura,Supree Pinitsoontorn
DOI: https://doi.org/10.1002/adts.202400308
2024-08-19
Advanced Theory and Simulations
Abstract:This study uses the StarryData2 database to develop an ML model for predicting the figure‐of‐merit (ZT) of thermoelectric materials. After systematic cleaning, the dataset includes 18,126 instances with 2,761 unique compounds. The XGBoost regressor achieves high prediction accuracy with an R2 score of 0.815, offering insights and accelerating the discovery of improved thermoelectric materials. In recent years, machine learning (ML) has emerged as a potential tool in the exploration of thermoelectric (TE) materials. This study exploits the StarryData2 public database to construct an ML model for predicting the figure‐of‐merit ZT of TE materials. The original dataset from StarryData2 (372,480 datapoints) underwent systematic cleaning, resulting in a refined dataset of 18,126 instances with 2,761 unique compounds. The cleaned data is employed to train an XGBoost regressor model, utilizing chemical formulas of TE compounds as features to predict ZT at given temperatures. The XGBoost regressor exhibited high prediction accuracy, achieving the coefficient of determination (R2) scores of 0.815 and mean absolute error (MAE) of 0.103 for the test set, further evaluated through cross‐validation across 5 folds. The learning curve analysis demonstrated improved model performance with increased training data. Furthermore, the contributions of different chemical descriptors to ZT are analyzed based on feature importance analysis. Beyond conventional TE families in the training set, the trained model is applied to predict ZT for promising unexplored TE materials and estimate optimal doping concentrations. This comprehensive study shows the impact of ML on TE material research, offering valuable insights and accelerating the discovery of materials with enhanced TE properties.
multidisciplinary sciences
What problem does this paper attempt to address?