OPES (on-the-fly probability enhanced sampling)
©️ Copyright 2023 @ Authors
Author:Pengchao Zhang 📨
Date:2023-12
License: This work is licensed under a Creative Commons Attribution-NonCommercial Use-ShareAlike 4.0 International License.
🎉 We will explain the OPES (on-the-fly probability enhanced sampling) method based on the references
- Rethinking Metadynamics: From Bias Potentials to Probability Distributions
- Exploration vs Convergence Speed in Adaptive-Bias Enhanced Sampling
If you are not familiar with enhanced sampling, here are some references:
- Metadynamics
- Enhancing Important Fluctuations: Rare Events and Metadynamics from a Conceptual Viewpoint
- Using metadynamics to explore complex free-energy landscapes
- Enhanced Sampling Methods
- PLUMED Tutorials
- RiD
If you can already largely manage the above, let's get started! This Notebook is cataloged below:
Fundamental knowledge
Considering a system with an interaction potential , where denotes the atomic coordinates. Sampling is accelerated by adding a bias potential that depends on via a set of collective variables (CVs), . The CVs are chosen so as to describe the modes of the system that are more difficult to sample. The choice of a proper set of CVs is critical, as it determines the efficiency of the method. The properties of the unbiased system are then calculated by using a reweighting procedure.
In fact the unbiased probability density can be written as an average over the biased ensemble: where is the inverse temperature.
In this way it is also possible to reconstruct the free energy surface (FES), defined as
Bias can be builded by adding at fixed intervals repulsive Gaussians centered at the instantaneous point sampled. At the -th iteration the bias is given by: where the parameter is called the bias factor, and the Gaussian function is defined as with height and variance set by the user. Typically only diagonal variances are employed. At convergence there is a simple relation between the bias and the free energy and the sampled ensemble is a smoothed version of the unbiased one, with FES barriers lowered by a factor .
Reconstruct the unbiased probability
Enhanced sampling based on the probability reconstruction is not a new idea. It was first proposed by the adaptive umbrella sampling method. Typically, in such methods the bias at -th iteration is defined as: where is an estimate of the probability obtained via a weighted histogram or some more elaborate method.
In building our method we will introduce few key differences that come from the long experience with MetaD.
First we would like to introduce explicitly a target distribution , that will be sampled once the method reaches convergence. This can be obtained with the following bias: In adaptive umbrella sampling the target distribution is uniform while in MetaD it is the well-tempered distribution
Here we express the target distribution as a function of the unbiased one, , we only need to estimate via reweighting in order to calculate the bias.
We build our probability distribution estimate on the fly by periodically depositing Gaussians, known as kernel density estimation (KDE). Each new Gaussian is weighted according to the previously deposited bias potential: where the weights are given by
is not properly normalized, and we will take care of the normalization separately.
The are Gaussians as those defined previously for MetaD, with diagonal variance and fixed height In KDE the most relevant parameter is the bandwidth, i.e., the width of the Gaussians.
A good choice of the bandwidth should depend on the amount of available data: the larger the sampling the smaller the bandwidth. Thus we choose to shrink the bandwidth as the simulation proceeds according to the popular Silverman's rule of thumb.
At -th iteration: where is the initial standard deviation estimated from a short unbiased simulation, is the dimensionality of the CV space, and is the effective sample size.
Update bias based on probability
We can now discuss the normalization problem.
should be normalized not with respect to the full CV space, but only over the CV space actually explored up to step , that we call . Thus we introduce the normalization factor that will change over time, as the system explores new regions of the CV space, and it will have an impact in the biasing scheme. This impact becomes particularly relevant in CV spaces of dimension , since the volume explored grows with a power of .
To estimate we take advantage of our compressed kernels representation, and consider the centers of the kernels as points for a Monte Carlo integration where are the compressed Gaussians, is their total number, and is the global normalization of the KDE.
Finally we can explicitly write the bias at the -th step as: where can be seen as a regularization term that ensures the argument of the logarithm is always greater than zero. We notice that the addition of this term is not merely a technicality to solve a numerical issue, but rather it allows one to set a limit to the bias, thus providing a better control over the desired exploration. It can be chosen to be where is the height of the free energy barriers one wishes to overcome.
On-the-fly probability?
If you are having difficulties following the above instructions, allow me to summarize six steps to aid in understanding the essential aspects of the OPES method.
The only input parameter considered is for the rare event.
The initial bias , i.e., minimum bias, is obtained as .
The Gaussian kernel is constructed using this above bias.
The unbiased probability is then constructed based on a weighted KDE approach (Reweighting!).
Update the bias based on the above and guidelines (Biasing!).
Repeat steps 2 to 5 until the regime is quasi-static, meaning that the the unbiased probability in a on-the-fly way.
- Finally, the free energy profile can be calculated through unbiased probability .
OPES-explore
OPES method focuses on fast convergence (e.g. using DPMD to calculate the free energy profiles), but there are cases where fast exploration is preferred instead (e.g. using DP-GEN iterations to build training datesets).
For this reason, we introduce a new variant of the OPES method that focuses on quickly escaping metastable states at the expense of convergence speed.
In formulating OPES-explore, we restrict ourselves to the case of using as target the well-tempered distribution
In OPES-explore, one builds the bias starting from the on-the-fly estimate of the distribution that is being sampled in the biased simulation: where is the CVs value sampled at step .
As the simulation converges, approaches the target well-tempered distribution . Thus, we use the approximation and write the bias:
Both OPES variants are applications of the general bias potential but OPES estimates on-the-fly and uses it to calculate the bias, while OPES-explore does the same but with .
The free energy surface as a function of the CVs can be estimated in two distinct ways, either directly from the probability estimate or via importance sampling reweighting, e.g., using a weighted KDE,
Differences between OPES and OPES-explore
The idea of defining the bias potential is similar, i.e., using as target the well-tempered distribution .
The way of estimating the probability distribution are different. Specifically, unbiased probability is estimated on-the-fly using weighted KDE in OPES (NOTE: the bias is well-tempered in OPES), while the well-tempered (biased) probability ) is estimated based on averaged KDE in OPES-explore.
Therefore, OPES method focuses on fast convergence, while OPES-explore focuses on fast exploration.
📖 This Notebook summarizes the OPES methodology based on the author's reading of the original literature and practical experience.
🎉 If you can't get enough of above, then you may want to read it:
Extension of the OPES method:
- Unified Approach to Enhanced Sampling
- Rare Event Kinetics from Adaptive Bias Enhanced Sampling
- OneOPES, a Combined Enhanced Sampling Method to Rule Them All
Application of the OPES method:
- Reactant-induced dynamics of lithium imide surfaces during the ammonia decomposition process
- Water regulates the residence time of Benzamidine in Trypsin
- How and When Does an Enzyme React? Unraveling α-Amylase Catalytic Activity with Enhanced Sampling Techniques
- Intramolecular and water mediated tautomerism of solvated glycine
- Toturial: Intramolecular and water mediated tautomerism of solvated glycine
- Propensity of water self-ions at air(oil)-water interfaces revealed by deep potential molecular dynamics with enhanced sampling
- Notebook: Voronoi CVs for enhanced sampling autoionization and tautomerism