Strategies for the Construction of Machine-Learning Potentials for Accurate and Efficient Atomic-Scale Simulations

April M. Miksch,Tobias Morawietz,Johannes Kästner,Alexander Urban,Nongnuch Artrith
DOI: https://doi.org/10.1088/2632-2153/abfd96
2021-05-06
Abstract:Recent advances in machine-learning interatomic potentials have enabled the efficient modeling of complex atomistic systems with an accuracy that is comparable to that of conventional quantum mechanics based methods. At the same time, the construction of new machine-learning potentials can seem a daunting task, as it involves data-science techniques that are not yet common in chemistry and materials science. Here, we provide a tutorial-style overview of strategies and best practices for the construction of artificial neural network (ANN) potentials. We illustrate the most important aspects of (i) data collection, (ii) model selection, (iii) training and validation, and (iv) testing and refinement of ANN potentials on the basis of practical examples. Current research in the areas of active learning and delta learning are also discussed in the context of ANN potentials. This tutorial review aims at equipping computational chemists and materials scientists with the required background knowledge for ANN potential construction and application, with the intention to accelerate the adoption of the method, so that it can facilitate exciting research that would otherwise be challenging with conventional strategies.
Materials Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to construct a machine - learning potential function (MLP) that can efficiently and accurately simulate complex systems at the atomic scale, so as to replace traditional quantum - mechanical methods. Specifically, the authors provide a tutorial - like overview, introducing the best practices and strategies for constructing artificial neural network (ANN) potential functions, including key steps such as data collection, model selection, training and validation, as well as testing and improvement. In addition, the applications of active learning and incremental learning in ANN potential functions are also discussed. ### Main problem analysis 1. **Efficiently modeling complex systems** - Traditional first - principles simulations based on quantum mechanics, such as density functional theory (DFT), can predict material properties with quantitative accuracy, but the computational cost is high and is usually limited to small systems of less than 1,000 atoms and nanosecond - time scales. - Machine - learning potential functions (especially MLP based on ANN) can significantly reduce the computational cost while maintaining an accuracy comparable to that of traditional methods, and the computational complexity only grows linearly with the number of atoms. 2. **Technical challenges in constructing MLP** - Constructing new machine - learning potential functions involves data - science techniques, which are not common in the fields of chemistry and materials science, and thus may seem extremely difficult for researchers. - The paper helps computational chemists and materials scientists master the necessary background knowledge by describing in detail the specific operation methods for each step, thereby accelerating the application and development of MLP. 3. **Ensuring the reliability and applicability of MLP** - The success of MLP depends on high - quality reference data sets, which need to cover the entire structure and chemical space required for the target application while minimizing unnecessary redundant data points. - The paper emphasizes the importance of the iterative process, that is, gradually expanding the reference data set through active learning to ensure that MLP reaches the expected accuracy in the final test. ### Key steps involved - **Data collection**: Generate an initial reference data set, including ideal crystal structures, structures derived from ideal structures, ideal structures with changed lattice parameters or scaled, and structures with defects. - **Model selection**: Determine the type of descriptors used to characterize the local atomic environment, as well as their resolution and truncation distance; select the ANN architecture, the number of nodes, and the activation function. - **Training and validation**: Optimize the model parameters to best reproduce the reference data set and detect over - fitting or under - fitting phenomena. - **Testing and improvement**: Evaluate the accuracy of the trained model, and if necessary, add more data points through active learning and repeat the above process. In summary, this paper aims to provide a comprehensive and practical guide for researchers so that they can effectively construct and apply machine - learning potential functions, thereby promoting the further development of the field of atomic - scale simulations.