Benchmarking Structural Evolution Methods for Training of Machine Learned Interatomic Potentials

Michael J. Waters,James M. Rondinelli
DOI: https://doi.org/10.1088/1361-648X/ac7f73
2022-03-30
Abstract:When creating training data for machine-learned interatomic potentials (MLIPs), it is common to create initial structures and evolve them using molecular dynamics to sample a larger configuration space. We benchmark two other modalities of evolving structures, contour exploration and dimer-method searches against molecular dynamics for their ability to produce diverse and robust training density functional theory data sets for MLIPs. We also discuss the generation of initial structures which are either from known structures or from random structures in detail to further formalize the structure-sourcing processes in the future. The polymorph-rich zirconium-oxygen composition space is used as a rigorous benchmark system for comparing the performance of MLIPs trained on structures generated from these structural evolution methods. Using Behler-Parrinello neural networks as our machine-learned interatomic potential models, we find that contour exploration and the dimer-method searches are generally superior to molecular dynamics in terms of spatial descriptor diversity and statistical accuracy.
Chemical Physics,Materials Science
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to sample the configuration space more effectively when generating structural data for training machine - learning interatomic potentials (MLIPs). Specifically, the authors compared three different structural evolution methods: **Contour Exploration (CE)**, **Dimer - Method Search (DM)** and **Molecular Dynamics (MD)** to evaluate their performance in generating more diverse and robust density - functional theory (DFT) datasets. #### Main problems: 1. **Improve the diversity of training data**: Existing methods usually rely on molecular dynamics to evolve the initial structure and sample the local configuration space. However, this method may not be able to cover important physical processes such as saddle points in diffusion and chemical reactions efficiently. 2. **Improve the robustness of training data**: The MD method has deficiencies in sampling rare events such as saddle points, and these events are crucial for many physical processes. Therefore, other dynamic evolution methods need to be explored to improve the robustness of training data. 3. **Optimize the generation of initial structures**: The paper also discusses the ways of generating initial structures, including generating from known structures or random structures, to further standardize the process of structure source. #### Research objectives: - **Compare the effects of different structural evolution methods**: By using the Zr - O system as a benchmark system, evaluate the performance of CE, DM and MD in generating MLIPs training data. - **Improve the accuracy and efficiency of MLIPs**: By adopting more effective structural evolution methods, improve the prediction ability and computational efficiency of MLIPs. #### Conclusions: - **CE and DM methods are superior to MD**: In terms of spatial descriptor diversity and statistical accuracy, the CE and DM methods perform well, especially the DM method which generates the highest data diversity. - **Reduce adjustable parameters and redundant sampling**: The CE and DM methods have fewer adjustable parameters and can sample the atomic configuration space more efficiently, thus reducing redundant sampling. Through these improvements, the research provides a more effective method for future MLIPs training, especially when dealing with the high - dimensional configuration space of complex material systems.