空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

OPES (即时概率增强采样)

Enhanced Sampling

Molecular Dynamics

English

Enhanced SamplingMolecular DynamicsEnglish

pc-zhang20@foxmail.com

zhangpengchao@dp.tech

发布于 2023-11-15

推荐镜像 :Basic Image:bohrium-notebook:2023-04-07

推荐机型 :c2_m4_cpu

OPES (on-the-fly probability enhanced sampling)

💭Catalogs

Fundamental knowledge

Reconstruct the unbiased probability

Update bias based on probability

On-the-fly probability?

OPES-explore

Differences between OPES and OPES-explore

Summary

OPES (on-the-fly probability enhanced sampling)

代码

文本

©️ Copyright 2023 @ Authors
Author：Pengchao Zhang 📨
Date：2023-12
License: This work is licensed under a Creative Commons Attribution-NonCommercial Use-ShareAlike 4.0 International License.

代码

文本

🎉 We will explain the OPES (on-the-fly probability enhanced sampling) method based on the references

If you are not familiar with enhanced sampling, here are some references:

If you can already largely manage the above, let's get started! This Notebook is cataloged below:

代码

文本

💭Catalogs

Fundamental knowledge
Reconstruct the unbiased probability
Update bias based on probability
On-the-fly probability?
OPES-explore
Differences between OPES and OPES-explore

代码

文本

Fundamental knowledge

Considering a system with an interaction potential $U (R)$ , where $R$ denotes the atomic coordinates. Sampling is accelerated by adding a bias potential $V (s)$ that depends on $R$ via a set of collective variables (CVs), $s = s (R)$ . The CVs are chosen so as to describe the modes of the system that are more difficult to sample. The choice of a proper set of CVs is critical, as it determines the efficiency of the method. The properties of the unbiased system are then calculated by using a reweighting procedure.

In fact the unbiased probability density $P (s) = ⟨ δ [s - s (R)]⟩ \propto \int d R e^{- β U (R)} δ [s - s (R)]$ can be written as an average over the biased ensemble: $P (s) = \frac{⟨ δ [ s - s ( R )] e ^{β V (s)} ⟩ _{V}}{⟨ e ^{β V (s)} ⟩ _{V}}$ where $β$ is the inverse temperature.

In this way it is also possible to reconstruct the free energy surface (FES), defined as $F (s) = - \frac{1}{β} lo g P (s)$

代码

文本

Bias $V (s)$ can be builded by adding at fixed intervals repulsive Gaussians centered at the instantaneous point sampled. At the $n$ -th iteration the bias is given by: $V_{n} (s) = k \sum n e^{- β V_{k - 1} (s_{k}) / (γ - 1)} G (s, s_{k})$ where the parameter $γ > 1$ is called the bias factor, and the Gaussian function is defined as $G (s, s^{'}) = h exp [- \frac{1}{2} (s - s^{'})^{T} Σ^{- 1} (s - s^{'})]$ with height $h$ and variance $Σ$ set by the user. Typically only diagonal variances $Σ_{ij} = σ_{i}^{2} δ_{ij}$ are employed. At convergence there is a simple relation between the bias and the free energy $V (s) = - (1 - 1/ γ) F (s)$ and the sampled ensemble is a smoothed version of the unbiased one, with FES barriers lowered by a factor $γ$ .

代码

文本

Reconstruct the unbiased probability

Enhanced sampling based on the probability reconstruction is not a new idea. It was first proposed by the adaptive umbrella sampling method. Typically, in such methods the bias at $n$ -th iteration is defined as: $V_{n} (s) = \frac{1}{β} lo g \hat{P}_{n} (s),$ where $\hat{P}_{n} (s)$ is an estimate of the probability obtained via a weighted histogram or some more elaborate method.

In building our method we will introduce few key differences that come from the long experience with MetaD.

First we would like to introduce explicitly a target distribution $p^{t g} (s)$ , that will be sampled once the method reaches convergence. This can be obtained with the following bias: $V (s) = - \frac{1}{β} lo g \frac{p ^{t g} ( s )}{P ( s )}$ In adaptive umbrella sampling the target distribution is uniform $p^{t g} (s) \propto 1$ while in MetaD it is the well-tempered distribution $p^{t g} (s) = p^{WT} (s) \propto [P (s)]^{1/ γ}$

Here we express the target distribution as a function of the unbiased one, $p^{t g} (s) \propto [P (s)]^{1/ γ}$ , we only need to estimate $P (s)$ via reweighting in order to calculate the bias.

We build our probability distribution estimate on the fly by periodically depositing Gaussians, known as kernel density estimation (KDE). Each new Gaussian is weighted according to the previously deposited bias potential: $\tilde{P}_{n} (s) = \frac{\sum _{k}^{n} w _{k} G ( s , s _{k} )}{\sum _{k}^{n} w _{k}}$ where the weights $w_{k}$ are given by $w_{k} = e^{β V_{k - 1} (s_{k})}$

代码

文本

$\tilde{P}_{n} (s)$ is not properly normalized, and we will take care of the normalization separately.

The $G (s, s_{k})$ are Gaussians as those defined previously for MetaD, with diagonal variance $Σ_{ij} = σ_{i}^{2} δ_{ij}$ and fixed height $h = i \prod (σ_{i} 2 π)^{- 1}$ In KDE the most relevant parameter is the bandwidth, i.e., the width of the Gaussians.

A good choice of the bandwidth should depend on the amount of available data: the larger the sampling the smaller the bandwidth. Thus we choose to shrink the bandwidth as the simulation proceeds according to the popular Silverman's rule of thumb.

At $n$ -th iteration: $σ_{i}^{(n)} = σ_{i}^{(0)} [N_{eff}^{(n)} (d + 2) /4]^{- 1/ (d + 4)}$ where $σ_{i}^{(0)}$ is the initial standard deviation estimated from a short unbiased simulation, $d$ is the dimensionality of the CV space, and $N_{eff}^{(n)} = \frac{( \sum _{k}^{n} w _{k} ) ^{2}}{\sum _{k}^{n} w _{k}^{2}}$ is the effective sample size.

代码

文本

Update bias based on probability

We can now discuss the normalization problem.

$\tilde{P}_{n} (s)$ should be normalized not with respect to the full CV space, but only over the CV space actually explored up to step $n$ , that we call $∣ Ω_{n} ∣$ . Thus we introduce the normalization factor $Z_{n} = \frac{1}{∣ Ω _{n} ∣} \int_{Ω_{n}} \tilde{P}_{n} (s) d s,$ that will change over time, as the system explores new regions of the CV space, and it will have an impact in the biasing scheme. This impact becomes particularly relevant in CV spaces of dimension $d ≫ 1$ , since the volume explored $Ω_{n}$ grows with a power of $d$ .

To estimate $Z_{n}$ we take advantage of our compressed kernels representation, and consider the centers of the kernels as points for a Monte Carlo integration $Z_{n} = \frac{1}{N} k \sum N \tilde{P} (s_{k}) = \frac{1}{NS} k, k^{'} \sum N G (s_{k}, s_{k^{'}})$ where $G (s_{k}, s_{k^{'}})$ are the compressed Gaussians, $N$ is their total number, and $S = k \sum n w_{k}$ is the global normalization of the KDE.

Finally we can explicitly write the bias at the $n$ -th step as: $V_{n} (s) = (1 - 1/ γ) \frac{1}{β} lo g (\frac{P ~ _{n} ( s )}{Z _{n}} + ϵ)$ where $ϵ ≪ 1$ can be seen as a regularization term that ensures the argument of the logarithm is always greater than zero. We notice that the addition of this term is not merely a technicality to solve a numerical issue, but rather it allows one to set a limit to the bias, thus providing a better control over the desired exploration. It can be chosen to be $ϵ = e^{- β Δ E / (1 - 1/ γ)}$ where $Δ E$ is the height of the free energy barriers one wishes to overcome.

代码

文本

On-the-fly probability?

If you are having difficulties following the above instructions, allow me to summarize six steps to aid in understanding the essential aspects of the OPES method.

The only input parameter considered is $Δ E$ for the rare event.
The initial bias $V$ , i.e., minimum bias, is obtained as $- Δ E$ .
The Gaussian kernel is constructed using this above bias.
The unbiased probability $P$ is then constructed based on a weighted KDE approach (Reweighting!).
Update the bias based on the above $P$ and guidelines (Biasing!).

Repeat steps 2 to 5 until the regime is quasi-static, meaning that the the unbiased probability $P$ in a on-the-fly way.

Finally, the free energy profile can be calculated through unbiased probability $P$ .

代码

文本

代码

文本

代码

文本

OPES-explore

OPES method focuses on fast convergence (e.g. using DPMD to calculate the free energy profiles), but there are cases where fast exploration is preferred instead (e.g. using DP-GEN iterations to build training datesets).

For this reason, we introduce a new variant of the OPES method that focuses on quickly escaping metastable states at the expense of convergence speed.

In formulating OPES-explore, we restrict ourselves to the case of using as target the well-tempered distribution $p^{t g} (s) = p^{WT} (s) \propto [P (s)]^{1/ γ}$

In OPES-explore, one builds the bias starting from the on-the-fly estimate of the distribution that is being sampled in the biased simulation: $p_{n}^{WT} (s) = \frac{1}{n} k \sum n G (s, s_{k}),$ where $s_{k}$ is the CVs value sampled at step $k$ .

As the simulation converges, $p_{n}^{WT} (s)$ approaches the target well-tempered distribution $p^{WT} (s)$ . Thus, we use the approximation $P (s) \propto [p_{n}^{WT} (s)]^{γ}$ and write the bias: $V_{n} (s) = (γ - 1) \frac{1}{β} lo g (\frac{p _{n}^{WT} ( s )}{Z _{n}} + ϵ),$

Both OPES variants are applications of the general bias potential $V (s) = - \frac{1}{β} lo g \frac{p ^{t g} ( s )}{P ( s )}$ but OPES estimates on-the-fly $P (s)$ and uses it to calculate the bias, while OPES-explore does the same but with $p^{WT} (s) \propto [P (s)]^{1/ γ}$ .

The free energy surface as a function of the CVs can be estimated in two distinct ways, either directly from the probability estimate $F_{n} (s) = - γ \frac{1}{β} lo g p_{n}^{WT} (s)$ or via importance sampling reweighting, e.g., using a weighted KDE, $F_{n} (s) = - \frac{1}{β} lo g k \sum n e^{β V_{k - 1} (s_{k})} G (s, s_{k}) .$

代码

文本

Differences between OPES and OPES-explore

The idea of defining the bias potential is similar, i.e., using as target the well-tempered distribution $p^{t g} (s) = p^{WT} (s) \propto [P (s)]^{1/ γ}$ .

The way of estimating the probability distribution are different. Specifically, unbiased probability $P (s)$ is estimated on-the-fly using weighted KDE in OPES (NOTE: the bias is well-tempered in OPES), while the well-tempered (biased) probability $p^{WT} (s) (\propto [P (s)]^{\frac{1}{γ}}$ ) is estimated based on averaged KDE in OPES-explore.

Therefore, OPES method focuses on fast convergence, while OPES-explore focuses on fast exploration.

代码

文本

代码

文本

alt opes_vs_explore2.png

代码

文本

Summary

代码

文本

📖 This Notebook summarizes the OPES methodology based on the author's reading of the original literature and practical experience.

代码

文本

🎉 If you can't get enough of above, then you may want to read it:

Extension of the OPES method:

Application of the OPES method：

代码

文本

[ ]

代码

文本

Enhanced Sampling

Molecular Dynamics

English

Enhanced SamplingMolecular DynamicsEnglish

已赞4

本文被以下合集收录

good notebooks collected by Taiping Hu

TaipingHu

更新于 2024-09-10

33 篇14 人关注

DPA2相关-自用

张星辰

更新于 2024-07-16

7 篇0 人关注