Accelerating the Training and Improving the Reliability of Machine-Learned Interatomic Potentials for Strongly Anharmonic Materials through Active Learning

Kisung Kang,Thomas A. R. Purcell,Christian Carbogno,Matthias Scheffler
2024-09-18
Abstract:Molecular dynamics (MD) employing machine-learned interatomic potentials (MLIPs) serve as an efficient, urgently needed complement to ab initio molecular dynamics (aiMD). By training these potentials on data generated from ab initio methods, their averaged predictions can exhibit comparable performance to ab initio methods at a fraction of the cost. However, insufficient training sets might lead to an improper description of the dynamics in strongly anharmonic materials, because critical effects might be overlooked in relevant cases, or only incorrectly captured, or hallucinated by the MLIP when they are not actually present. In this work, we show that an active learning scheme that combines MD with MLIPs (MLIP-MD) and uncertainty estimates can avoid such problematic predictions. In short, efficient MLIP-MD is used to explore configuration space quickly, whereby an acquisition function based on uncertainty estimates and on energetic viability is employed to maximize the value of the newly generated data and to focus on the most unfamiliar but reasonably accessible regions of phase space. To verify our methodology, we screen over 112 materials and identify 10 examples experiencing the aforementioned problems. Using CuI and AgGaSe$_2$ as archetypes for these problematic materials, we discuss the physical implications for strongly anharmonic effects and demonstrate how the developed active learning scheme can address these issues.
Materials Science,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the shortcomings of Machine Learning Interatomic Potentials (MLIPs) during training in strongly anharmonic materials, particularly the reliability issues when describing rare events and strong anharmonic effects. Specifically, the paper points out: 1. **Insufficient training data**: Existing MLIPs may not adequately cover the relevant configuration space during training, leading to erroneous predictions in the dynamics of strongly anharmonic materials. 2. **Handling of rare events**: Some rare events in strongly anharmonic materials (such as phase transition precursors, defect formation, etc.) may be missing or insufficiently represented in the training data, resulting in poor performance of MLIPs in these regions. 3. **Over-smoothing**: To improve the generalization ability of the model, regularization during training may smooth out rare events, thus losing accurate descriptions of these events. 4. **False rare events**: MLIPs may generate false rare events that do not exist in actual first-principles calculations. To address these issues, the paper proposes a method combining Active Learning (AL) and Molecular Dynamics (MD) to guide the acquisition of new data through uncertainty and energy feasibility, thereby improving the reliability and accuracy of MLIPs in strongly anharmonic materials. The specific methods include: - **Exploration and data sampling**: Using efficient MLIP-MD to quickly explore the configuration space and identify and sample unfamiliar regions through uncertainty estimation and energy feasibility. - **Data acquisition**: Performing ab initio calculations on the identified unfamiliar configurations to obtain true force, energy, and stress data, and adding them to the training set to retrain the MLIPs. - **Incorporating uncertainty estimation**: Using the standard deviation of predictions from multiple MLIP models to assess uncertainty, thereby more effectively identifying unfamiliar configurations. Through these methods, the paper demonstrates how to improve the training efficiency and reliability of MLIPs in strongly anharmonic materials, especially when dealing with rare events and strong anharmonic effects.