Using physical property surrogate models to perform accelerated multi-fidelity optimization of force field parameters

Owen Madin,Michael Shirts
DOI: https://doi.org/10.1039/d2dd00138a
2023-05-06
Digital Discovery
Abstract:van der Waals dispersion-repulsion interactions, commonly represented in atomistic force fields by the Lennard-Jones (LJ) potential, play an important role in the accuracy of molecular simulations. Training the force field parameters used in the LJ potential is challenging, generally requiring adjustment based on simulations of macroscopic physical properties. The large computational expense of these simulations, especially if many parameters are trained simultaneously, limits the size of training data set and number of optimization steps that can be taken, often requiring modelers to perform optimizations within a local parameter region. To allow for more global LJ parameter optimization against large training sets, we introduce a multi-fidelity optimization technique which uses Gaussian process surrogate modeling to build inexpensive models of physical properties as a function of LJ parameters. This allows for fast evaluation of objective functions, greatly accelerating searches over parameter space and enabling the use of global optimization algorithms. We use an iterative framework which performs optimization with differential evolution at the surrogate level, followed by validation at the simulation level and surrogate refinement. Using this technique on two previously studied training sets, containing up to 195 physical property targets, we refit a subset of the LJ parameters for the OpenFF 1.0.0 (Parsley) force field. We demonstrate that this multi-fidelity technique can find improved parameter sets compared to a purely simulation-based optimization by searching more broadly and escaping local minima. In particular, this technique often finds significantly different parameter minima that have comparably accurate performance. In most cases, these parameter sets are transferable to other similar molecules in a test set. This multi-fidelity technique provides a platform for fast optimization against physical properties that can be refined and applied in multiple ways to the development of molecular models.
What problem does this paper attempt to address?