Active learning strategies for atomic cluster expansion models

Yury Lysogorskiy,Anton Bochkarev,Matous Mrovec,Ralf Drautz
DOI: https://doi.org/10.48550/arXiv.2212.08716
2022-12-17
Abstract:The atomic cluster expansion (ACE) was proposed recently as a new class of data-driven interatomic potentials with a formally complete basis set. Since the development of any interatomic potential requires a careful selection of training data and thorough validation, an automation of the construction of the training dataset as well as an indication of a model's uncertainty are highly desirable. In this work, we compare the performance of two approaches for uncertainty indication of ACE models based on the D-optimality criterion and ensemble learning. While both approaches show comparable predictions, the extrapolation grade based on the D-optimality (MaxVol algorithm) is more computationally efficient. In addition, the extrapolation grade indicator enables an active exploration of new structures, opening the way to the automated discovery of rare-event configurations. We demonstrate that active learning is also applicable to explore local atomic environments from large-scale MD simulations.
Materials Science
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of how to automatically construct training data sets and evaluate model uncertainty when developing the Atomic Cluster Expansion (ACE) model. Specifically, the paper focuses on the following key issues: 1. **Construction of automatic training data sets**: - The development of any inter - atomic potential function requires careful selection of training data and thorough verification. - Automating this process can improve efficiency and reduce human error. 2. **Evaluation of model uncertainty**: - Provide an effective method to evaluate the uncertainty of the ACE model, especially the extrapolation error when dealing with unseen atomic configurations. - Ensure the reliability and transferability of the model on unknown data. 3. **Comparison of different uncertainty evaluation methods**: - Compare the performance of two methods based on the D - optimality criterion and ensemble learning. - Investigate the applicability and efficiency of these two methods in the structural space and composition space. 4. **Application of active learning**: - Explore how to use active learning strategies to automatically discover new structures, especially those configurations of rare events. - Demonstrate the application of active learning in exploring local atomic environments in large - scale molecular dynamics (MD) simulations. ### Main research content The paper explores the above issues in detail through the following aspects: - **Introduction to the ACE model**: Introduces the basic principles of the ACE model and its advantages in describing atomic environments. - **Uncertainty evaluation methods**: - **Ensemble learning**: Estimates uncertainty by training multiple models and statistically aggregating their prediction results. - **D - optimality criterion**: Uses the MaxVol algorithm to select representative training data and calculates the extrapolation grade to evaluate uncertainty. - **Performance comparison**: Conducts uncertainty evaluations in the structural and composition spaces for the Cu (copper) and Al - Ni (aluminum - nickel alloy) systems and compares the effects of the two methods. - **Active learning strategy**: Demonstrates how to use uncertainty indicators to select high - value training samples and gradually improve model parameterization. ### Conclusion The paper shows that the D - optimality criterion has advantages in computational efficiency and accuracy, and is particularly suitable for active learning strategies in large - scale simulations. Through these methods, the reliability and applicable range of the ACE model can be significantly improved, thereby accelerating the development process of machine - learning inter - atomic potential functions.