Constructing machine learning potentials with active learning

Cheng Shang,Zhi-Pan Liu
DOI: https://doi.org/10.1016/b978-0-323-90049-2.00018-4
2023-01-01
Abstract:The high-quality training dataset is of great importance to the success of machine learning (ML) applications. For generating ML potentials to describe multidimensional potential energy surfaces (PESs), an ideal training set should be not only large enough to include all representative atomic configurations of interest but also as small as possible to reduce the cost in performing quantum chemistry calculations. The traditional way to generate a training set is often inefficient and empirical, requiring intensive manpower, which could introduce a high redundancy in geometrical features of low-energy structures and then cause overfitting because of the imbalance of data density. In this chapter, we will introduce the active learning (AL) algorithm, a subclass of supervised ML, in generating ML potentials aiming at automatically optimizing the quality of the training dataset. Three widely used strategies in the AL algorithm to expand the training set are presented and discussed in connection with their applications to different ML potentials. We also illustrate how the AL algorithm can help to build a high-quality training dataset and thus train a global neural network (G-NN) potential, as shown in the example of the Li system.
What problem does this paper attempt to address?