Training Transferable Interatomic Neural Network Potentials for Reactive Chemistry: Improved Chemical Space Sampling

Jason Goodpaster,Quin Hu,Adrian Gordon,Andrew Johanessen,Leqian Tan
DOI: https://doi.org/10.26434/chemrxiv-2024-c375f
2024-03-07
Abstract:Large, condensed phased, and extended systems remain a challenge for theoretical studies due to the compromise between accuracy and computational cost in their calculations. Machine learning methods are on the rise to solve this trade off by training on large datasets of highly accurate calculations that are traditionally hard to obtain. The development of interatomic machine learning potentials has resulted in the ability to model high-quality potential energy surfaces with near ab initio level of accuracy at low computational cost. However, just like other machine learning applications, such methods face challenges when it comes to quality training data and transferability, specifically to systems of chemical space beyond its training. In this work, we present the continuous exploration of utilizing machine learning methods to build and achieve accurate and efficient potential energy surface for bond dissociation and reactive chemistry, and explore sampling techniques that can allow interatomic neural network potentials designed to model potential energy surfaces, such as ANI and NequIP, to accurately predict bond dissociation energy and model reactive chemistry, and to obtain transferability beyond its training data across chemical space.
Chemistry
What problem does this paper attempt to address?
This paper mainly discusses how to use machine learning methods, especially the Interatomic Neural Network Potential (INNP), to address the contradiction between the accuracy of energy calculations for large and complex systems in chemical reactions and the computational cost. The authors are concerned about the generalization ability of INNP in the chemical space beyond the training dataset, especially for applications involving bond-breaking and reactive chemistry. They utilize the ANI (Accurate Neural Network engine for Molecular Energies) model and the NequIP (Neural Equivariant Interatomic Potentials) model, and improve the sampling technique to enhance the accuracy of model predictions for bond energies and simulation of reactive chemistry, as well as the generalization across chemical space. In the study, the authors find that although the ANI and NequIP models demonstrate certain transferability from small to large systems, their transferability is still limited when dealing with unseen chemical environments, such as new elements or new chemical groups. To improve this situation, they develop a new sampling strategy, particularly for the geometric configuration of carbon radicals, to train the model in predicting bond-breaking patterns and extend to chemical spaces with similar but different neighboring environments. Experimental results show that by incorporating specific geometric information of carbon radicals into the training data, the model's accuracy in predicting C-C bond-breaking in different alkyl variants and oxygen-functionalized compounds can be significantly improved. These results indicate that by carefully selecting and expanding the training dataset, the INNP model can better generalize to unseen chemical systems, thereby providing more accurate and efficient computational tools in theoretical chemistry and materials science.