Exploring chemical and conformational spaces by batch mode deep active learning

Viktor Zaverkin,David Holzmüller,Ingo Steinwart,Johannes Kästner
DOI: https://doi.org/10.1039/d2dd00034b
2022-08-02
Digital Discovery
Abstract:The development of machine-learned interatomic potentials requires generating sufficiently expressive atomistic data sets. Active learning algorithms select data points on which labels, i.e. , energies and forces, are calculated for inclusion in the training set. However, for batch mode active learning, i.e. , when multiple data points are selected at once, conventional active learning algorithms can perform poorly. Therefore, we investigate algorithms specifically designed for this setting and show that they can outperform conventional algorithms. We investigate selection based on the informativeness, diversity, and representativeness of the resulting training set. We propose using gradient features specific to atomistic neural networks to evaluate the informativeness of queried samples, including several approximations allowing for their efficient evaluation. To avoid selecting similar structures, we present several methods that enforce the diversity and representativeness of the selected batch. Finally, we apply the proposed approaches to several molecular and periodic bulk benchmark systems and argue that they can be used to generate highly informative atomistic data sets by running any atomistic simulation.
What problem does this paper attempt to address?