Artificial intelligence real-time prediction and physical interpretation of atomic binding energies in nano-scale metal clusters

Philomena Schlexer Lamoureux,Tej S. Choksi,Verena Streibel,Frank Abild-Pedersen
DOI: https://doi.org/10.48550/arXiv.2005.02572
2020-05-06
Abstract:Single atomic sites often determine the functionality and performance of materials, such as catalysts, semi-conductors or enzymes. Computing and understanding the properties of such sites is therefore a crucial component of the rational materials design process. Because of complex electronic effects at the atomic level, atomic site properties are conventionally derived from computationally expensive first-principle calculations, as this level of theory is required to achieve relevant accuracy. In this study, we present a widely applicable machine learning (ML) approach to compute atomic site properties with high accuracy in real time. The approach works well for complex non-crystalline atomic structures and therefore opens up the possibility for high-throughput screenings of nano-materials, amorphous systems and materials interfaces. Our approach includes a robust featurization scheme to transform atomic structures into features which can be used by common machine learning models. Performing a genetic algorithm (GA) based feature selection, we show how to establish an intuitive physical interpretation of the structure-property relations implied by the ML models. With this approach, we compute atomic site stabilities of metal nanoparticles ranging from 3-55 atoms with mean absolute errors in the range of 0.11-0.14 eV in real time. We also establish the chemical identity of the site as most important factor in determining atomic site stabilities, followed by structural features like bond distances and angles. Both, the featurization and GA feature selection functionality are published in open-source python modules. With this method, we enable the efficient rational design of highly specialized real-world nano-catalysts through data-driven materials screening.
Chemical Physics,Computational Physics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use machine - learning methods to predict the atomic binding energy in nano - scale metal clusters in real - time and interpret its physical significance. Specifically, the authors proposed a widely applicable machine - learning method that can calculate atomic - site properties in real - time with high precision. This method is applicable to complex amorphous atomic structures, thus providing the possibility for high - throughput screening of nanomaterials. ### Background and Objectives of the Paper In materials science, single - atom sites often determine the functions and properties of materials, such as catalysts, semiconductors or enzymes. Therefore, calculating and understanding the properties of these sites are key steps in the rational design of materials. However, due to the complex electronic effects at the atomic level, although traditional first - principles calculations can achieve the required precision, the computational cost is very high. This paper proposes a machine - learning - based method that can predict atomic - site properties with high precision in real - time calculations while maintaining physical interpretability. ### Main Contributions 1. **Characterization Scheme**: - Proposed a new characterization method that can convert atomic structures into features that can be used by machine - learning models. - This characterization scheme only depends on basic chemical and geometric information and does not require expensive first - principles calculations. 2. **High - Throughput Screening**: - This method is applicable to aperiodic, non - ordered (amorphous) and ordered (crystalline) structures, making it possible for high - throughput screening of nanomaterials and material interfaces. 3. **Physical Interpretability**: - Through feature selection by genetic algorithm (GA), an intuitive physical interpretation of the structure - property relationship was established. - Studies have shown that chemical identity (i.e., the chemical properties of atoms) is the most important factor in determining the stability of atomic sites, followed by structural features such as bond lengths and bond angles. 4. **Application Examples**: - The authors successfully predicted the stability of atomic sites in metal nanoparticles composed of 3 - 55 atoms using this method, with a mean absolute error (MAE) between 0.11 - 0.14 eV. ### Method Overview 1. **Data Generation**: - Generated single - metal and bimetallic sub - nano - clusters of 3 - 13 atoms, as well as cubic octahedral nanoparticles and surface structures of 55 atoms. 2. **Characterization**: - Extracted 28 unique features, including the number of atoms, coordination number, bond length, bond angle, etc. 3. **Model Training and Evaluation**: - Used multiple machine - learning models (such as linear regression, Gaussian process regression, neural network, random forest and extreme gradient boosting tree) for training and evaluation. - Through 4 - fold cross - validation and hyperparameter optimization, the best model was selected. 4. **Feature Importance Analysis**: - Through feature selection by genetic algorithm, the most important features were determined. - The results show that chemical features (such as the number of valence electrons of atoms) are the most critical factors in determining the stability of atomic sites, followed by structural features (such as coordination number). ### Conclusions This study proposed an efficient and physically interpretable machine - learning method that can predict the stability of atomic sites in nano - scale metal clusters with high precision in real - time calculations. This method provides a powerful tool for high - throughput screening and rational design of nanomaterials.