A study of machine-learning-derived formulas using artificially generated dataset

Donggeon Lee,Sooran Kim
DOI: https://doi.org/10.1007/s40042-024-01103-w
2024-05-31
Journal of the Korean Physical Society
Abstract:In this study, we investigate the effectiveness of machine learning (ML) models in constructing empirical formulas for the superconducting transition temperature ( T c ) by comparing ML-derived equations with McMillan's equation. We utilized artificially generated data with a size of 10,000 from McMillan's equation and employed the parametric brute force searching (BFS) algorithm to search for model equations varying model complexity and dataset size. The BFS models with features of the Debye temperature and electron–phonon coupling exhibit the RMSE of 0.830 K and R 2 of 0.976 even with a small dataset size of 100. The ML-derived formula is also close to McMillan's equation showing a linear relationship between the Debye temperature and T c , as well as a cubic relationship between electron–phonon coupling and T c . Furthermore, we analyzed feature contributions using non-parametric random forest (RF) regression and found the strong relevance of electron–phonon coupling on T c . Our results demonstrate the importance of feature selection and model complexity in effectively predicting T c rather than simply adding more data.
physics, multidisciplinary
What problem does this paper attempt to address?