Abstract:When creating training data for machine-learned interatomic potentials (MLIPs), it is common to create initial structures and evolve them using molecular dynamics to sample a larger configuration space. We benchmark two other modalities of evolving structures, contour exploration and dimer-method searches against molecular dynamics for their ability to produce diverse and robust training density functional theory data sets for MLIPs. We also discuss the generation of initial structures which are either from known structures or from random structures in detail to further formalize the structure-sourcing processes in the future. The polymorph-rich zirconium-oxygen composition space is used as a rigorous benchmark system for comparing the performance of MLIPs trained on structures generated from these structural evolution methods. Using Behler-Parrinello neural networks as our machine-learned interatomic potential models, we find that contour exploration and the dimer-method searches are generally superior to molecular dynamics in terms of spatial descriptor diversity and statistical accuracy.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of how to sample the configuration space more effectively when generating structural data for training machine - learning interatomic potentials (MLIPs). Specifically, the authors compared three different structural evolution methods: **Contour Exploration (CE)**, **Dimer - Method Search (DM)** and **Molecular Dynamics (MD)** to evaluate their performance in generating more diverse and robust density - functional theory (DFT) datasets. #### Main problems: 1. **Improve the diversity of training data**: Existing methods usually rely on molecular dynamics to evolve the initial structure and sample the local configuration space. However, this method may not be able to cover important physical processes such as saddle points in diffusion and chemical reactions efficiently. 2. **Improve the robustness of training data**: The MD method has deficiencies in sampling rare events such as saddle points, and these events are crucial for many physical processes. Therefore, other dynamic evolution methods need to be explored to improve the robustness of training data. 3. **Optimize the generation of initial structures**: The paper also discusses the ways of generating initial structures, including generating from known structures or random structures, to further standardize the process of structure source. #### Research objectives: - **Compare the effects of different structural evolution methods**: By using the Zr - O system as a benchmark system, evaluate the performance of CE, DM and MD in generating MLIPs training data. - **Improve the accuracy and efficiency of MLIPs**: By adopting more effective structural evolution methods, improve the prediction ability and computational efficiency of MLIPs. #### Conclusions: - **CE and DM methods are superior to MD**: In terms of spatial descriptor diversity and statistical accuracy, the CE and DM methods perform well, especially the DM method which generates the highest data diversity. - **Reduce adjustable parameters and redundant sampling**: The CE and DM methods have fewer adjustable parameters and can sample the atomic configuration space more efficiently, thus reducing redundant sampling. Through these improvements, the research provides a more effective method for future MLIPs training, especially when dealing with the high - dimensional configuration space of complex material systems.

Benchmarking Structural Evolution Methods for Training of Machine Learned Interatomic Potentials

Robust Training of Machine Learning Interatomic Potentials with Dimensionality Reduction and Stratified Sampling

Accelerating the Training and Improving the Reliability of Machine-Learned Interatomic Potentials for Strongly Anharmonic Materials through Active Learning

Role of Structural and Conformational Diversity for Machine Learning Potentials

Modern Semiempirical Electronic Structure Methods and Machine Learning Potentials for Drug Discovery: Conformers, Tautomers, and Protonation States

Accelerating Training of MLIPs Through Small-Cell Training

Training Data Selection for Accuracy and Transferability of Interatomic Potentials

Machine learning potentials with Iterative Boltzmann Inversion: training to experiment

De novo exploration and self-guided learning of potential-energy surfaces

Machine Learning Interatomic Potentials for Amorphous Zeolitic Imidazolate Frameworks

Dual adaptive sampling and machine learning interatomic potentials for modeling materials with chemical bond hierarchy

Learning together: Towards foundation models for machine learning interatomic potentials with meta-learning

Improving Molecular Dynamics Simulations for Solid‐Liquid Interface with Machine Learning Interatomic Potentials

A Hessian-Based Assessment of Atomic Forces for Training Machine Learning Interatomic Potentials

Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials

Machine-Learned Potentials by Active Learning from Organic Crystal Structure Prediction Landscapes

Learning Together: Towards foundational models for machine learning interatomic potentials with meta-learning

When More Data Hurts: Optimizing Data Coverage While Mitigating Diversity Induced Underfitting in an Ultra-Fast Machine-Learned Potential

Benchmarking phonon anharmonicity in machine learning interatomic potentials