Machine-Learning Predictions of Critical Temperatures from Chemical Compositions of Superconductors

Son Gyo Jung,Guwon Jung,Jacqueline M Cole
DOI: https://doi.org/10.1021/acs.jcim.4c01137
2024-10-14
Abstract:In the quest for advanced superconducting materials, the accurate prediction of critical temperatures (Tc) poses a formidable challenge, largely due to the complex interdependencies between superconducting properties and the chemical and structural characteristics of a given material. To address this challenges, we have developed a machine-learning framework that aims to elucidate these complicated and hitherto poorly understood structure-property and property-property relationships. This study introduces a novel machine-learning-based workflow, termed the Gradient Boosted Feature Selection (GBFS), which has been tailored to predict Tc for superconductors by employing a distributed gradient-boosting framework. This approach integrates exploratory data analyses, statistical evaluations, and multicollinearity reduction techniques to select highly relevant features from a high-dimensional feature space, derived solely from the chemical composition of materials. Our methodology was rigorously tested on a data set comprising approximately 16,400 chemical compounds with around 12,000 unique chemical compositions. The GBFS workflow enabled the development of a classification model that distinguishes compositions likely to exhibit Tc values greater than 10 K. This model achieved a weighted average F1-score of 0.912, an AUC-ROC of 0.986, and an average precision score of 0.919. Additionally, the GBFS workflow underpinned a regression model that predicted Tc values with an R2 of 0.945, an MAE of 3.54 K, and an RMSE of 6.57 K on a test set obtained via random splitting. Further exploration was conducted through out-of-sample Tc predictions, particularly those exceeding the liquid nitrogen temperature, and out-of-distribution predictions for (Ca1-xLax)FeAs2 based on varying lanthanum content. The outcome of our study underscores the significance of systematic feature analysis and selection in enhancing predictive model performance, offering various advantages over models that rely primarily on algorithmic complexity. This research not only advances the field of superconductivity but also sets a precedent for the application of machine learning in materials science.
What problem does this paper attempt to address?