Information-entropy-driven generation of material-agnostic datasets for machine-learning interatomic potentials

Aparna P. A. Subramanyam,Danny Perez
2024-07-15
Abstract:In contrast to their empirical counterparts, machine-learning interatomic potentials (MLIAPs) promise to deliver near-quantum accuracy over broad regions of configuration space. However, due to their generic functional forms and extreme flexibility, they can catastrophically fail to capture the properties of novel, out-of-sample configurations, making the quality of the training set a determining factor, especially when investigating materials under extreme conditions. In the present study, we propose a novel automated dataset generation method based on the maximization of the information entropy of the feature distribution, aiming at an extremely broad coverage of the configuration space in a way that is agnostic to the properties of specific target materials. The ability of the dataset to capture unique material properties is demonstrated on a range of unary materials, including elements with the fcc (Al), bcc (W), hcp (Be, Re and Os), and graphite (C) ground states. MLIAPs trained to this dataset are shown to be accurate over a broad range of application-relevant metrics, as well as extremely robust over very broad swaths of configurations space, even without dataset fine-tuning or hyper-parameter optimization, making the approach extremely attractive to rapidly and autonomously develop general-purpose MLIAPs suitable for simulations in extreme conditions.
Materials Science
What problem does this paper attempt to address?
This paper attempts to address the issue of poor predictive performance of machine learning interatomic potentials (MLIAPs) on novel sample configurations when the training dataset is insufficient or missing. Specifically, the paper proposes a new automated dataset generation method that achieves extensive coverage of the configuration space by maximizing the information entropy of the feature distribution. This method does not rely on the properties of specific target materials. The researchers demonstrate the effectiveness of this method on various elemental materials (such as aluminum, tungsten, beryllium, rhenium, osmium, and carbon in graphite), showing that MLIAPs trained using such datasets not only exhibit high accuracy but also demonstrate exceptional robustness across a wide configuration space, even without dataset fine-tuning or hyperparameter optimization. This method is particularly suitable for the rapid and autonomous development of general-purpose MLIAPs for simulations under extreme conditions. In summary, the main goal of the paper is to develop a systematic, simple, and fast method to create efficient and highly transferable MLIAPs, thereby enabling the study of material behavior under a wide range of conditions.