Dealing with the big data challenges in AI for thermoelectric materials

Xue Jia,Alex Aziz,Yusuke Hashimoto,Hao Li

DOI: https://doi.org/10.1007/s40843-023-2777-2

2024-03-14

Science China Materials

Abstract:The development of artificial intelligence (AI), particularly, data science and machine learning (ML), is revolutionizing the field of material science. Yet, some inevitable key challenges remain, including errors contained in large-scale material datasets and the overfitting of predicted temperature-dependent properties. In this work, using thermoelectric (TE) materials as an archetypal example, we firstly performed a series of rational actions to identify and discard questionable data, and obtained 92,291 data points consisting of 7295 compositions and different temperatures from the Starrydata2 database. Next, we proposed a composition-based cross-validation method to emphasize that the data points with the same compositions but different temperatures should not be split into different sets to avoid overfitting. Then, we built ML models using the gradient boosting decision tree (GBDT) method, and achieved remarkable R 2 values of ∼0.89, ∼0.90, and ∼0.89 on the training dataset, test dataset, and new out-of-sample experimental data published in 2023, verifying the model's high accuracy in predicting newly available materials. Using this ML model, we carried out a large-scale evaluation of the stable materials from the Materials Project database, and Ge 2 Te 5 As 2 and Ge 3 (Te 3 As) 2 were predicted to exhibit high zT values. Density functional theory calculations were then executed and the calculated maximum zT values were 1.98 and 2.12 for n- and p-type Ge 2 Te 5 As 2 , and 0.58 and 0.74 for n- and p-type Ge 3 (Te 3 As) 2 , respectively, indicating their potential as TE materials and supporting our ML model. This work presents an example of dealing with and overcoming big data challenges in AI for materials science.

materials science, multidisciplinary

What problem does this paper attempt to address?

The paper primarily focuses on addressing the big data challenges in the field of thermoelectric materials, particularly on how to utilize artificial intelligence (AI) technology to improve the efficiency of thermoelectric material screening and the accuracy of performance prediction. Specifically, the research addresses the following key issues: 1. **Data Quality Issues**: Handling erroneous data present in large-scale thermoelectric material databases, such as typographical errors in publications and experimental errors. Low-quality data is identified and removed through reasonable strategies. 2. **Overfitting Issues**: Avoiding overfitting phenomena related to temperature-dependent properties during machine learning modeling, ensuring that the model can effectively predict the performance of new materials. To this end, a composition-based cross-validation method is proposed. 3. **Establishing Efficient Prediction Models**: Utilizing optimized datasets to construct machine learning models to predict the dimensionless figure of merit (zT value) of new materials. The model demonstrates high prediction accuracy, performing well on the training set, test set, and newly published experimental data. 4. **Prediction and Validation of New Materials**: Based on the established model, a series of potential high-performance thermoelectric materials were predicted, and the potential of two materials (Ge2Te5As2 and Ge3(Te3As)2) as thermoelectric materials was further validated through density functional theory calculations. In summary, this paper proposes a systematic approach to addressing the big data challenges in the field of thermoelectric materials, accelerating the discovery process of high-performance thermoelectric materials through machine learning technology.

Dealing with the big data challenges in AI for thermoelectric materials

Predicting High‐Performance Thermoelectric Materials With StarryData2

Artificial Intelligence Guided Thermoelectric Materials Design and Discovery

Machine Learning Approaches for Thermoelectric Materials Research

Data analytics accelerates the experimental discovery of Cu1−xAgxGaTe2 based thermoelectric chalcogenides with high figure of merit

Data analytics accelerates the experimental discovery of new thermoelectric materials with extremely high figure of merit

Machine learning for predicting ZT values of high-performance thermoelectric materials in mid-temperature range

Accelerating Materials-Space Exploration for Thermal Insulators by Mapping Materials Properties via Artificial Intelligence

Knowledge extraction and performance improvement of Bi2Te3-based thermoelectric materials by machine learning

Accurate and explainable machine learning for the power factors of diamond-like thermoelectric materials

Combining Machine Learning Models with First-Principles High-Throughput Calculation to Accelerate the Search of Promising Thermoelectric Materials

High‐throughput Strategies in The Discovery of Thermoelectric Materials

Machine learning in thermoelectric materials identification: Feature selection and analysis

Prediction of thermoelectric-figure-of-merit based on autoencoder and light gradient boosting machine

A Critical Review of Machine Learning Techniques on Thermoelectric Materials

Towards Tailored Thermoelectric Materials: An Artificial Intelligence-Powered Approach to Material Design

Machine-Learning Guided Prediction of Thermoelectric Properties of Topological Insulator Bi2Te3-xSex

Thermoelectric Prediction from Material Descriptors Using Machine Learning Technique

Machine Learning for Predicting Ultralow Thermal Conductivity and High ZT in Complex Thermoelectric Materials

Artificial Intelligence for Learning Material Synthesis Processes of Thermoelectric Materials

A multi-objective, multi-interpretable machine learning demonstration verified by domain knowledge for ductile thermoelectric materials