空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

用于分子动力学模拟轨迹后处理的马尔可夫模型（MSM）

中文

生物信息学

Molecular Dynamics

MSM

中文生物信息学Molecular DynamicsMSM

zhangjun19

发布于 2023-07-31

推荐镜像 :Basic Image:bohrium-notebook:2023-04-07

推荐机型 :c2_m4_cpu

用于分子动力学模拟轨迹后处理的马尔可夫模型（MSM）

目标：

背景：

Reference：

前言

马尔可夫状态模型

MSM基本原理

MSM构建流程

Workflow

Showcase pentapeptide: a PyEMMA walkthrough

Feature selection

Coordinate transform and discretization

TICA

Discretization

MSM estimation and validation

Implied timescales

Chapman-Kolmogorov test

MSM spectral analysis

PCCA & TPT

Transition path theory

Computing experimental observables

Radius of gyration

Trp-flourescene auto-correlation

Hidden Markov models

Assembling manuscript figures

后续

代码

文本

用于分子动力学模拟轨迹后处理的马尔可夫模型（MSM）

代码

文本

📖 Getting Started Guide
使用方式: 您可以在 Bohrium Notebook. 上直接运行。您可以点击界面上方蓝色按钮开始连接选择 bohrium-notebook:2023-04-07 Image 镜像及任何一款节点配置，稍等片刻即可运行。如您遇到任何问题，请联系 [bohrium@dp.tech](mailto:bohrium@dp.tech)。

目标：

本文档旨在掌握用于分子动力学模拟轨迹后处理的马尔可夫模型

掌握对模拟轨迹进行降维，建模，分析等步骤

背景：

你需要提前掌握:

利用Gromacs, Amber, NAMD等软件进行分子动力学模拟，得到模拟轨迹；

Reference：

http://msmbuilder.org/development/examples/Fs-Peptide-in-RAM.html
Husic, B. E. & Pande, V. S. Markov State Models: From an Art to a Science. J. Am. Chem. Soc. 140, 2386–2396 (2018).
An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation. vol. 797 (Springer Netherlands, 2014).

代码

文本

前言

马尔可夫状态模型

马尔可夫状态模型（MSM）是一种应用于分子动力学模拟的数学模型，可用于识别动力学相关的状态，通过估计转移概率矩阵而对这些状态之间的相互转换进行准确描述。MSM 允许高度并行化采样和系统统计描述，可以使用一组短时 MD 模拟来预测热力学信息和长时间尺度（例如毫秒）上的动力学信息。近些年来广泛应用于如蛋白质折叠、分子识别和小分子扩散等领域。

MSM基本原理

马尔可夫状态模型的基本结构可以用数学公式表示如下：

状态空间：设状态空间为 {S1, S2, ..., Sn}，其中Si表示第i个状态。
状态转移概率：状态转移矩阵P表示系统从状态Si转移到状态Sj的概率，即 P(Si, Sj) = P(X(t+1) = Sj | X(t) = Si) 其中，P(Si, Sj)是从状态Si到状态Sj的转移概率，X(t)表示在时间t时刻系统的状态。
马尔可夫性质：马尔可夫性质可以用条件概率的形式表示为 P(X(t+1) = Sj | X(t) = Si, X(t-1) = Sk, ..., X(0) = S0) = P(X(t+1) = Sj | X(t) = Si) 即在给定当前状态下的未来状态的条件概率只与当前状态有关，与过去状态无关。
固定转移概率：状态转移概率在整个过程中保持不变，即 P(Si, Sj) = P(X(t+1) = Sj | X(t) = Si) 对于所有的时间点t。
状态转移的可达性：通过状态转移矩阵P，可以计算系统在不同时间点之间的状态转移概率。对于给定的时间步长t，可以计算t步转移概率矩阵 P^t，其中 (P^t)(Si, Sj) = P(X(t) = Sj | X(0) = Si) 表示系统在时间t时刻从状态Si转移到状态Sj的概率。

MSM构建流程

代码

文本

Workflow

为了建立动态模型，我们（逐步地）进行了一系列降维处理。基本步骤概述如下。请注意，大多数步骤在某些情况下是可选的。在继续阅读文档的过程中，具体步骤会逐渐清晰。

建立分子动力学系统，在尽可能多的 CPU 或 GPU 上尽可能长时间地运行一次或多次模拟。有很多运行 MD 的优秀软件包，如 OpenMM、Gromacs、Amber、CHARMM 等。MSMBuilder 并非其中之一。
将轨迹特征化为适当的特征向量。完整的 3N 原子坐标集可能既笨重又冗余。它很可能也不尊重你的系统的旋转或平移对称性。我们通常使用骨架二面角作为特征，不过这在很大程度上取决于所建模的系统。
将特征分解成一个新的基础，以更少的维度保留数据中的相关信息。我们通常使用 tICA，它可以找到输入自由度的线性组合，从而最大限度地提高自相关性或 "慢度"。
对数据进行聚类，通过将相似的输入数据点分组来定义（微）状态。在这一阶段，我们已经将问题的维度从潜在的数千个 xyz 坐标降低到了单个聚类（状态）索引。
从聚类数据中估计模型。我们通常会建立一个 MSM，对系统的重要动态进行建模。
使用 GMRQ 交叉验证来选择最佳模型。工作流程中有许多超参数（可调整的旋钮）。这个评分函数可以帮助我们选出最佳值。

代码

文本

使用pyemma包来进行处理和分析

代码

文本

[ ]

%%bash

conda install -c conda-forge pyemma

Retrieving notices: ...working... done
/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'repo.anaconda.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'repo.anaconda.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'conda.anaconda.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
Collecting package metadata (current_repodata.json): ...working... done

代码

文本

[ ]

!pip install mdshare

代码

文本

Showcase pentapeptide: a PyEMMA walkthrough

在该案例中，我们介绍了PyEMMA 的最基本功能。以对五肽的分析作为MSM分析分子动力学轨迹的示例工作流程。我们展示了使用25个独立的五肽模拟的轨迹进行处理的PyEMMA工作流程：

用隐式溶剂模拟五肽，并用0.1 ns时间步长。

代码

文本

[ ]

import warnings

warnings.filterwarnings('ignore')

import matplotlib.pyplot as plt

import matplotlib as mpl

import numpy as np

import mdshare

import pyemma

from pyemma.util.contexts import settings

#使用mdshare的fetch功能下载指定五肽的pdb和xtc轨迹文件，并保存在data文件夹中。

pdb = mdshare.fetch('pentapeptide-impl-solv.pdb', working_directory='data')

files = mdshare.fetch('pentapeptide-*-500ns-impl-solv.xtc', working_directory='data')

print(pdb)

print(files)

代码

文本

由于不知道哪个功能最能描述系统，我们从广泛的系统分析开始。为了简单起见，我们只对主干动力学建模感兴趣。因此，我们只考虑描述主链的特征，而不考虑侧链的特征。在 PyEMMA 中， $f e a t u r i zer$ 是包含系统拓扑的中心对象。通过添加目标特征可以轻松计算特征，例如，使用 $f e a t u r i zer . a d d_{b} a c kb o n e_{t} ors i o n s ()$ .我们将加载主干扭转角、主臂重原子位置和主臂重原子距离。⚠️ 请注意，这些结构之前已经对齐过。由于在这种情况下，我们失去了周期框的跟踪，因此我们必须关闭距离和扭转角计算的periodic标志。

代码

文本

[ ]

%matplotlib inline

torsions_feat = pyemma.coordinates.featurizer(pdb)

torsions_feat.add_backbone_torsions(cossin=True, periodic=False)

torsions_data = pyemma.coordinates.load(files, features=torsions_feat)

labels = ['backbone\ntorsions']

positions_feat = pyemma.coordinates.featurizer(pdb)

positions_feat.add_selection(positions_feat.select_Backbone())

positions_data = pyemma.coordinates.load(files, features=positions_feat)

labels += ['backbone atom\npositions']

distances_feat = pyemma.coordinates.featurizer(pdb)

distances_feat.add_distances(

distances_feat.pairs(distances_feat.select_Backbone(), excluded_neighbors=2), periodic=False)

distances_data = pyemma.coordinates.load(files, features=distances_feat)

labels += ['backbone atom\ndistances']

代码

文本

Feature selection

我们现在将通过VAMP2分数对三种特征进行排名，该分数测量这些特征中包含的动力学方差 mcgibbon-15，wu-17，mardt-17。此分数的最小值为，它对应于不变测度或均衡。当我们比较具有不同维度的特征化时，我们使用维度参数来提供评分中包含的动态过程数量的上限。

代码

文本

[ ]

print(distances_feat)

代码

文本

[ ]

def score_cv(data, dim, lag, number_of_splits=10, validation_fraction=0.5):

"""Compute a cross-validated VAMP2 score.

We randomly split the list of independent trajectories into

a training and a validation set, compute the VAMP2 score,

and repeat this process several times.

Parameters

----------

data : list of numpy.ndarrays

The input data.

dim : int

Number of processes to score; equivalent to the dimension

after projecting the data with VAMP2.

lag : int

Lag time for the VAMP2 scoring.

number_of_splits : int, optional, default=10

How often do we repeat the splitting and score calculation.

validation_fraction : int, optional, default=0.5

Fraction of trajectories which should go into the validation

set during a split.

"""

# we temporarily suppress very short-lived progress bars

with pyemma.util.contexts.settings(show_progress_bars=False):

nval = int(len(data) * validation_fraction)

scores = np.zeros(number_of_splits)

for n in range(number_of_splits):

ival = np.random.choice(len(data), size=nval, replace=False)

vamp = pyemma.coordinates.vamp(

[d for i, d in enumerate(data) if i not in ival], lag=lag, dim=dim)

scores[n] = vamp.score([d for i, d in enumerate(data) if i in ival])

return scores

import matplotlib.pyplot as plt

dim = 10

fig, axes = plt.subplots(1, 3, figsize=(12, 3), sharey=True)

for ax, lag in zip(axes.flat, [5, 10, 20]):

torsions_scores = score_cv(torsions_data, lag=lag, dim=dim)

scores = [torsions_scores.mean()]

errors = [torsions_scores.std()]

positions_scores = score_cv(positions_data, lag=lag, dim=dim)

scores += [positions_scores.mean()]

errors += [positions_scores.std()]

distances_scores = score_cv(distances_data, lag=lag, dim=dim)

scores += [distances_scores.mean()]

errors += [distances_scores.std()]

ax.bar(labels, scores, yerr=errors, color=['C0', 'C1', 'C2'])

ax.set_title(r'lag time $\tau$={:.1f}ns'.format(lag * 0.1))

if lag == 5:

# save for later

vamp_bars_plot = dict(

labels=labels, scores=scores, errors=errors, dim=dim, lag=lag)

axes[0].set_ylabel('VAMP2 score')

fig.tight_layout()

代码

文本

⚠️ 请注意，VAMP-2 分数不适合选择合适的延迟时间，因为不同延迟时间的分数没有可比性。

在这里，我们只是分别比较每个给定滞后时间的不同功能。我们一直发现骨架扭转略有优势。

因此，我们通过改变几个滞后的尺寸参数，为骨架扭转添加了更详细的VAMP2分数分析：

代码

文本

[ ]

lags = [1, 2, 5, 10, 20]

dims = [i + 1 for i in range(10)]

fig, ax = plt.subplots()

for i, lag in enumerate(lags):

scores_ = np.array([score_cv(torsions_data, dim, lag)

for dim in dims])

scores = np.mean(scores_, axis=1)

errors = np.std(scores_, axis=1, ddof=1)

color = 'C{}'.format(i)

ax.fill_between(dims, scores - errors, scores + errors, alpha=0.3, facecolor=color)

ax.plot(dims, scores, '--o', color=color, label='lag={:.1f}ns'.format(lag * 0.1))

ax.legend()

ax.set_xlabel('number of dimensions')

ax.set_ylabel('VAMP2 score')

fig.tight_layout()

代码

文本

我们观察到，对于超过0.5 ns的滞后时间，使用四个以上的维度不会增加分数，即前四个维度包含慢速动力学的所有相关信息。基于此结果，我们尝试滞后时间为0.5 ns（5步）的TICA投影。请注意，这是建模者根据我们目前可用的最佳启发式方法选择的。在 MSM 估计之后，可能需要重新调整 TICA 滞后时间。

代码

文本

Coordinate transform and discretization

TICA

下一步的目标是找到一个函数，将通常的高维输入空间映射到一些捕捉重要动态的低维空间中。建议使用时滞独立分量分析（TICA）进行操作，molgedey-94，perez-hernandez-13。我们使用从VAMP-2得分获得的滞后时间进行TICA（使用动力学映射缩放）。通过使用tica()函数的默认参数，我们将使用尽可能多的维度以保留95%的动力学方差。默认情况下，tica()还应用了动力学映射缩放。该缩放确保投影空间中的欧几里得距离逼近动力学距离，在后续离散化过程中具有益处。请注意，通用的PyEMMA API对于所有估计器都是一致的。通过使用数据调用TICA估计器（tica = pyemma.coordinates.tica(torsions_data)），完成估计并返回一个估计器实例（tica）；此对象包含有关特定转换的所有信息。对于小型系统，我们可以通过调用tica.get_output()来访问转换后的数据。对于大型系统，我们建议将tica对象本身传递到后续阶段，例如聚类，以避免将所有转换后的数据加载到内存中。

代码

文本

[ ]

tica = pyemma.coordinates.tica(torsions_data, lag=5)

tica_output = tica.get_output()

tica_concatenated = np.concatenate(tica_output)

#Visualize the marginal and joint distributions of our TICA components by simple histograming:

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

pyemma.plots.plot_feature_histograms(

tica_concatenated,

ax=axes[0],

feature_labels=['IC1', 'IC2', 'IC3', 'IC4'],

ylog=True)

pyemma.plots.plot_density(*tica_concatenated[:, :2].T, ax=axes[1], logscale=True)

axes[1].set_xlabel('IC 1')

axes[1].set_ylabel('IC 2')

fig.tight_layout()

代码

文本

我们注意到，该投影产生了确定的高密度集群，这些集群最有可能被确定为可转移盆地。让我们看看其中一条轨迹，以及它在前四个TICA成分空间中的样子。我们可以看到，TICA分量很好地解决了慢速转换的离散跳跃。因此，转移性在这个投影中被很好地描述了出来。

代码

文本

[ ]

fig, axes = plt.subplots(4, 1, figsize=(12, 5), sharex=True)

x = 0.1 * np.arange(tica_output[0].shape[0])

for i, (ax, tic) in enumerate(zip(axes.flat, tica_output[0].T)):

ax.plot(x, tic)

ax.set_ylabel('IC {}'.format(i + 1))

axes[-1].set_xlabel('time / ns')

fig.tight_layout()

代码

文本

Discretization

现在，TICA 坐标将使用 -means 算法聚类为多个离散状态。-means 算法需要所需数量的聚类作为输入。轨迹通过调用cluster.dtrajs自动分配给集群中心。

⚠️ 先验地不清楚聚类中心的最佳数量k是多少。这在很大程度上取决于我们数据的分布和我们使用的维度数量。

在下文中，我们将使用不同数量的聚类中心估计未经验证的马尔可夫模型，并使用 VAMP-2 分数（使用交叉验证）作为启发式方法。这种方法要求我们猜测 MSM 滞后时间，我们将其设置为 TICA 滞后时间5 步（或0.5 ns)。由于聚类算法是随机的，因此我们在每个聚类中心数上进行多轮离散化。

代码

文本

[ ]

n_clustercenters = [5, 10, 30, 75, 200, 450]

scores = np.zeros((len(n_clustercenters), 5))

for n, k in enumerate(n_clustercenters):

for m in range(5):

with pyemma.util.contexts.settings(show_progress_bars=False):

_cl = pyemma.coordinates.cluster_kmeans(

tica_output, k=k, max_iter=50, stride=50)

_msm = pyemma.msm.estimate_markov_model(_cl.dtrajs, 5)

scores[n, m] = _msm.score_cv(

_cl.dtrajs, n=1, score_method='VAMP2', score_k=min(10, k))

fig, ax = plt.subplots()

lower, upper = pyemma.util.statistics.confidence_interval(scores.T.tolist(), conf=0.9)

ax.fill_between(n_clustercenters, lower, upper, alpha=0.3)

ax.plot(n_clustercenters, np.mean(scores, axis=1), '-o')

ax.semilogx()

ax.set_xlabel('number of cluster centers')

ax.set_ylabel('VAMP-2 score')

fig.tight_layout()

代码

文本

我们发现 VAMP-2 分数在75状态已经饱和。我们将使用此数字进行进一步分析。如上所述，分数是使用未经验证的 MSM 生成的，这意味着上面的图实际上只是一个启发式的。除了获得最佳分数之外，我们还希望获得一个描述物理上有趣状态的模型。因此，状态的数量k通常在模型检查后重新调整。

代码

文本

[ ]

cluster = pyemma.coordinates.cluster_kmeans(

tica_output, k=75, max_iter=50, stride=10, fixed_seed=1)

dtrajs_concatenated = np.concatenate(cluster.dtrajs)

fig, ax = plt.subplots(figsize=(4, 4))

pyemma.plots.plot_density(

*tica_concatenated[:, :2].T, ax=ax, cbar=False, alpha=0.3)

ax.scatter(*cluster.clustercenters[:, :2].T, s=5, c='C1')

ax.set_xlabel('IC 1')

ax.set_ylabel('IC 2')

fig.tight_layout()

代码

文本

这些状态在低维TICA子空间中分布良好。

MSM estimation and validation

Implied timescales

代码

文本

[ ]

its = pyemma.msm.its(cluster.dtrajs, lags=50, nits=10, errors='bayes')

pyemma.plots.plot_implied_timescales(its, units='ns', dt=0.1);

代码

文本

实线对应最大似然 MSM 的 ITS。置信区间用阴影区域表示；它们包含贝叶斯 MSM 生成的 95% 的样本。样本平均值用虚线表示。隐含时间尺度迅速收敛。超过 0.5 ns 以上，最慢过程的隐含时标在误差范围内保持不变。因此，我们选择滞后时间为 5 步（0.5 ns）来建立马尔可夫模型。作为快速检查，我们打印了活动集中的状态和计数分数。请注意 msm 对象与 tica 对象的相似性。两者都是估算器实例，包含了估算的所有相关信息以及用于验证和进一步分析的方法。为了跟踪我们的轨迹时间步长，可以传递一个包含轨迹时间步长单位的 dt_traj 关键字参数。

代码

文本

[ ]

msm = pyemma.msm.bayesian_markov_model(cluster.dtrajs, lag=5, dt_traj='0.1 ns')

print('fraction of states used = {:.2f}'.format(msm.active_state_fraction))

print('fraction of counts used = {:.2f}'.format(msm.active_count_fraction))

代码

文本

Chapman-Kolmogorov test

该模型通过Chapman-Kolmogorov检验进行验证。它比较了Chapman-Kolmogorov方程的右侧和左侧

$P (k τ) = P^{k} (τ)$

P(τ)是过渡矩阵，滞后时间 τ. PyEMMA 会在滞后时间 kτ 自动估算出新的 MSM 过渡矩阵，并通过 P(τ) 传播原始过渡矩阵。自动估算出一个新的 MSM 过渡矩阵，并以 k-的幂次传播原始过渡矩阵。最高 k 可以使用 msm.cktest() 的 mlags 关键字参数进行调整。由于我们只能检测少量（宏观）状态的结果，因此我们使用隐含时标图作为启发式来估算需要检测的可迁移状态的数量。我们可以解决 4 慢过程，滞后时间可达 2.5ns。由于Chapman-Kolmogorov检验涉及到更高滞后时间的估计，我们将尝试捕捉那些选择 5可迁移状态。

代码

文本

[ ]

nstates = 5

cktest = msm.cktest(nstates, mlags=6)

pyemma.plots.plot_cktest(cktest, dt=0.1, units='ns');

代码

文本

假设数据中有 5 数据中的亚态就能通过C-P检验。

MSM spectral analysis

从 MSM 对象 $m s m$ 可以得到各种属性。我们首先对隐含时标进行频谱分析。

代码

文本

我们将继续分析静态分布和在前两个 TICA 坐标上计算出的自由能。静态分布 π 存储在 msm.pi 或（别名）msm.stationary_distribution 中。我们使用 MSM 中的静态概率对轨迹帧重新加权（由 msm.trajectory_weights() 返回），从而计算自由能分布。

代码

文本

[ ]

fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)

pyemma.plots.plot_contour(

*tica_concatenated[:, :2].T,

msm.pi[dtrajs_concatenated],

ax=axes[0],

mask=True,

cbar_label='stationary distribution')

pyemma.plots.plot_free_energy(

*tica_concatenated[:, :2].T,

weights=np.concatenate(msm.trajectory_weights()),

ax=axes[1],

legacy=False)

for ax in axes.flat:

ax.set_xlabel('IC 1')

axes[0].set_ylabel('IC 2')

axes[0].set_title('Stationary distribution', fontweight='bold')

axes[1].set_title('Reweighted free energy surface', fontweight='bold')

fig.tight_layout()

代码

文本

与最慢过程（最大隐含时间尺度）相对应的特征向量包含了在哪些时间尺度上发生了哪些构型变化的信息。我们通过检查投影到两个第一个 TICA 坐标上的前四个特征函数的值来分析最慢过程。由于第一个右特征向量对应的是静止过程（平衡），因此它恒定为 1。

代码

文本

[ ]

eigvec = msm.eigenvectors_right()

print('The first eigenvector is one: {} (min={}, max={})'.format(

np.allclose(eigvec[:, 0], 1, atol=1e-15), eigvec[:, 0].min(), eigvec[:, 0].max()))

fig, axes = plt.subplots(1, 4, figsize=(15, 3), sharex=True, sharey=True)

for i, ax in enumerate(axes.flat):

pyemma.plots.plot_contour(

*tica_concatenated[:, :2].T,

eigvec[dtrajs_concatenated, i + 1],

ax=ax,

cmap='PiYG',

cbar_label='{}. right eigenvector'.format(i + 2),

mask=True)

ax.set_xlabel('IC 1')

axes[0].set_ylabel('IC 2')

fig.tight_layout()

代码

文本

MSM 的特征向量包含正在发生的构象变化信息，受相应的隐含时标支配。具体来说，特征向量的最小值和最大值分量表明了一个过程之间的状态转换概率。这种交换过程的松弛时间尺度正是隐含时间尺度。

由于特征向量是根据其特征值进行内部排序的，因此上述直观图描述了隐含时标图中最慢的四个过程。我们可以看到，最慢的过程确实发生在 TICA 投影中的密集星团之间。

PCCA & TPT

Perron cluster cluster analysis

Note: We will assign the integer numbers 1... nstates to PCCA++ metastable states. 由于 PyEMMA 是用 Python 编写的，它的内部索引是从 0 开始的。因此，代码单元格中的数字与绘图标签和标记文本中的数字相差-1。

代码

文本

[ ]

msm.pcca(nstates)

代码

文本

PCCA++ 算法计算所谓的成员关系，即每个微状态属于给定宏状态的概率。换句话说，PCCA++ 将微观状态模糊赋值给宏观状态，并将其编码在成员关系中。我们可以直观地看到在前两个TICA 维度上的 5 个成员分布如下：

代码

文本

[ ]

fig, axes = plt.subplots(1, 5, figsize=(15, 3), sharex=True, sharey=True)

for i, ax in enumerate(axes.flat):

pyemma.plots.plot_contour(

*tica_concatenated[:, :2].T,

msm.metastable_distributions[i][dtrajs_concatenated],

ax=ax,

cmap='afmhot_r',

mask=True,

cbar_label='metastable distribution {}'.format(i + 1))

ax.set_xlabel('IC 1')

axes[0].set_ylabel('IC 2')

fig.tight_layout()

代码

文本

正如我们所看到的，成员概率与上述自由能图谱的盆地大致吻合。

在某些情况下，将这些分布转化为清晰的分配可能会很有用。这可以通过求取每个微观状态与宏观状态的成员概率的 argmax 来计算。它们包含在 $m s m . m e t a s t ab l e_{a} ss i g nm e n t s$ 中。让我们看看这在前两个 TICA 预测中是什么样子的。

代码

文本

[ ]

metastable_traj = msm.metastable_assignments[dtrajs_concatenated]

fig, ax = plt.subplots(figsize=(5, 4))

_, _, misc = pyemma.plots.plot_state_map(

*tica_concatenated[:, :2].T, metastable_traj, ax=ax)

ax.set_xlabel('IC 1')

ax.set_ylabel('IC 2')

misc['cbar'].set_ticklabels([r'$\mathcal{S}_%d$' % (i + 1)

for i in range(nstates)])

fig.tight_layout()

代码

文本

不出所料，PCCA++ 很好地在前两个 TICA 部分中分离了我们的状态空间。

此时，我们通常希望研究已识别的可迁移结构对应于哪些分子结构。我们会为每个宏态生成一些有代表性的结构样本，并将其存储在轨迹文件中，以便进行目测。下面的单元会将轨迹文件写入硬盘。这些文件可通过外部软件包加载和分析。

代码

文本

[ ]

pcca_samples = msm.sample_by_distributions(msm.metastable_distributions, 10)

torsions_source = pyemma.coordinates.source(files, features=torsions_feat)

pyemma.coordinates.save_trajs(

torsions_source,

pcca_samples,

outfiles=['./data/pcca{}_10samples.pdb'.format(n + 1)

for n in range(msm.n_metastable)])

代码

文本

此外，我们还可以使用 NGLView 将该笔记本中的结构可视化。为此，我们需要提供一个自定义函数来定义分子的表示方法：

代码

文本

[ ]

def visualize_metastable(samples, cmap, selection='not element H'):

""" visualize metastable states

Parameters

----------

samples: list of mdtraj.Trajectory objects

each element contains all samples for one metastable state.

cmap: matplotlib.colors.ListedColormap

color map used to visualize metastable states before.

selection: str

which part of the molecule to selection for visualization. For details have a look here:

http://mdtraj.org/latest/examples/atom-selection.html#Atom-Selection-Language

"""

import nglview

from matplotlib.colors import to_hex

widget = nglview.NGLWidget()

widget.clear_representations()

ref = samples[0]

for i, s in enumerate(samples):

s = s.superpose(ref, atom_indices=s.top.select('resid 2 3 and mass > 2'))

s = s.atom_slice(s.top.select(selection))

comp = widget.add_trajectory(s)

comp.add_licorice()

# this has to be done in a separate loop for whatever reason...

x = np.linspace(0, 1, num=len(samples))

for i, x_ in enumerate(x):

c = to_hex(cmap(x_))

widget.update_licorice(color=c, component=i, repr_index=i)

widget.remove_cartoon(component=i)

return widget

代码

文本

与将轨迹保存到磁盘类似，我们现在用 $m d t r aj . T r aj ec t ory$ 对象创建一个列表，其中包含我们的可变结构样本。这些对象通过 NGLView 可视化显示如下：

代码

文本

[ ]

my_samples = [pyemma.coordinates.save_traj(files, idist, outfile=None, top=pdb)

for idist in msm.sample_by_distributions(msm.metastable_distributions, 50)]

cmap = mpl.cm.get_cmap('viridis', nstates)

visualize_metastable(my_samples, cmap)

代码

文本

这种粗粒度的动态表示法更适合人类解释。不过，与传统的 MSM 一样，我们仍然可以计算出一些有趣的特性。我们从静态分布开始，它编码了各状态的自由能。这可以通过对粗粒度状态 $S_{i}$ 的所有贡献求和来实现:

$G_{i} = - k_{B} Tl n \sum_{j \in S_{i}} π_{j}$

代码

文本

[ ]

print('state\tπ\t\tG/kT')

for i, s in enumerate(msm.metastable_sets):

p = msm.pi[s].sum()

print('{}\t{:f}\t{:f}'.format(i + 1, p, -np.log(p)))

代码

文本

了解了 PCCA++ 的亚稳态，我们还可以提取它们之间的平均首次通过时间（MFPT）：

代码

文本

[ ]

from itertools import product

mfpt = np.zeros((nstates, nstates))

for i, j in product(range(nstates), repeat=2):

mfpt[i, j] = msm.mfpt(

msm.metastable_sets[i],

msm.metastable_sets[j])

from pandas import DataFrame

print('MFPT / ns:')

DataFrame(np.round(mfpt, decimals=2), index=range(1, nstates + 1), columns=range(1, nstates + 1))

代码

文本

在接下来的章节中，我们将清楚地看到，亚稳态1可以通过实验与其他状态区分开来。我们可以利用贝叶斯样本提取从（进入）任何其他状态进入（离开）该状态的 MFPT，如下所示：

代码

文本

[ ]

A = msm.metastable_sets[0]

B = np.concatenate(msm.metastable_sets[1:])

print('MFPT 1 -> other: ({:6.1f} ± {:5.1f}) ns'.format(

msm.sample_mean('mfpt', A, B), msm.sample_std('mfpt', A, B)))

print('MFPT other -> 1: ({:.1f} ± {:5.1f}) ns'.format(

msm.sample_mean('mfpt', B, A), msm.sample_std('mfpt', B, A)))

代码

文本

从状态 1 到任何其他状态的 MFPT 与其他方向相比都非常短，也就是说，这个状态的生命周期相当短。

代码

文本

Transition path theory

亚状态之间的通量可按如下方法计算和粗粒化。例如，我们计算态 2 和态 4 之间的通量。

代码

文本

[ ]

start, final = 1, 3

A = msm.metastable_sets[start]

B = msm.metastable_sets[final]

flux = pyemma.msm.tpt(msm, A, B)

cg, cgflux = flux.coarse_grain(msm.metastable_sets)

代码

文本

投影到 TICA 前两个维度上的committor可以通过填充等高线图显示出来：

代码

文本

[ ]

fig, ax = plt.subplots(figsize=(5, 4))

pyemma.plots.plot_contour(

*tica_concatenated[:, :2].T,

flux.committor[dtrajs_concatenated],

cmap='brg',

ax=ax,

mask=True,

cbar_label=r'committor $\mathcal{S}_%d \to \mathcal{S}_%d$' % (

start + 1, final + 1))

fig.tight_layout()

代码

文本

We find that the committor is constant within the metastable sets defined above. Transition regions can be identified by committor values ≈0.5 .

代码

文本

Computing experimental observables

在彻底构建、验证和分析了我们的 MSM 之后，我们可能想采取下一步，将我们的模型与实验数据进行比较。PyEMMA 可以计算静态和动态实验观测值；下面我们将给出一些这方面的例子。我们将利用 MDTraj mcgibbon-15 提供的一些外部库函数。

深入的理论描述及其在各种数据中的应用可参见以下参考资料： - OLSON-17 - NOE-11 - OLSON-16 - LINDNER-13

代码

文本

[ ]

from mdtraj import shrake_rupley, compute_rg

#We compute a maximum likelihood MSM for comparison

mlmsm = pyemma.msm.estimate_markov_model(cluster.dtrajs, lag=5, dt_traj='0.1 ns')

代码

文本

PyEMMA 计算所有实验观测值的方法依赖于我们对 MSM 中每个马尔可夫状态或轨迹中每个帧的观测值。计算这些量取决于相关实验，可能需要特定领域的知识，或涉及耗时的计算。在这里，我们将在以下单元中预先计算这些实验观测值。

首先，我们从每个马尔可夫状态中抽取20个有代表性的构型。请注意，20 个代表性构型并不是一个通用数字：一个特定的观测指标可能需要更多的代表性配置才能收敛。

代码

文本

[ ]

markov_samples = [smpl for smpl in msm.sample_by_state(20)]

reader = pyemma.coordinates.source(files, top=pdb)

samples = [pyemma.coordinates.save_traj(reader, smpl, outfile=None, top=pdb)

for smpl in markov_samples]

代码

文本

现在，我们使用 MDTraj 反向计算两个实验观测值，每个采样构型的溶剂可及表面积（SASA）和回旋半径，并在马尔可夫状态上平均每个观测值。每个平均值都对应于我们想要预测的宏观实验观测值的马尔可夫状态proxy。

代码

文本

[ ]

# Compute solvent accessible surface area for all samples

markov_sasa_all = [shrake_rupley(sample, mode='residue')

for sample in samples]

# Compute radius of gyration for all samples

markov_rg_all = [compute_rg(sample) for sample in samples]

# Average over Markov states for both observables.

markov_average_trp_sasa = np.array(markov_sasa_all).mean(axis=1)[:, 0]

markov_average_rg = np.array(markov_rg_all).mean(axis=1)

print(markov_average_trp_sasa)

print(markov_average_rg)

代码

文本

Radius of gyration

代码

文本

[ ]

print('The average radius of gyration of penta-peptide is'

' {:.3f} nm'.format(msm.expectation(markov_average_rg)))

代码

文本

由于我们已经估算出贝叶斯 MSM，因此还可以用标准偏差或置信区间来计算我们对观测值预测的不确定性。为此，我们使用了 sample_std 和 sample_conf 方法：

代码

文本

[ ]

print('The standard deviation of our prediction of the average radius of gyration'

' of pentapeptide is {:.9f} nm'.format(

msm.sample_std('expectation', markov_average_rg)))

print('The {:d}% CI of our prediction of the average radius of gyration of'

' pentapeptide have the bounds ({:.5f}, {:.5f})'.format(

int(msm.conf * 100), *msm.sample_conf('expectation', markov_average_rg)))

代码

文本

因此，我们的模型对回转半径的预测非常有把握。但是，这并不能保证它的准确性，即与实验测量结果一致。如果我们缺乏与实验的定量一致性，我们可以使用增强马尔可夫模型（AMM）程序估算出最能平衡实验数据和模拟数据的 MSM。

Trp-flourescene auto-correlation

色氨酸荧光的波动可通过光谱技术进行测量。这些波动主要取决于色氨酸残基的溶剂可及表面积（SASA）。在上文中，我们使用 Shrake-Rupley 算法预先计算了 SASA，并在此展示了投影到前两个 TICA 维度上的 SASA：

代码

文本

[ ]

fig, ax = plt.subplots(figsize=(5, 4))

pyemma.plots.plot_contour(

*tica_concatenated[:, :2].T,

markov_average_trp_sasa[dtrajs_concatenated],

ax=ax,

mask=True,

cbar_label=r'Trp-1 SASA / nm$^2$')

ax.set_xlabel('IC 1')

ax.set_ylabel('IC 2')

fig.tight_layout()

代码

文本

至于上文考虑的静态期望（系综平均值），我们可以使用预先计算的 SASA 向量，用 MSM $corre l a t i o n ()$ 方法计算 trypotophan flourescene 的自相关函数：

代码

文本

[ ]

eq_time_ml, eq_acf_ml = mlmsm.correlation(markov_average_trp_sasa, maxtime=150)

eq_time_bayes, eq_acf_bayes = msm.sample_mean(

'correlation',

np.array(markov_average_trp_sasa),

maxtime=150)

eq_acf_bayes_ci_l, eq_acf_bayes_ci_u = msm.sample_conf(

'correlation',

np.array(markov_average_trp_sasa),

maxtime=150)

fig, ax = plt.subplots()

ax.plot(eq_time_ml, eq_acf_ml, '-o', color='C1', label='ML MSM')

ax.plot(

eq_time_bayes,

eq_acf_bayes,

'--x',

color='C0',

label='Bayes sample mean')

ax.fill_between(

eq_time_bayes,

eq_acf_bayes_ci_l[1],

eq_acf_bayes_ci_u[1],

facecolor='C0',

alpha=0.3)

ax.semilogx()

ax.set_xlim((eq_time_ml[1], eq_time_ml[-1]))

ax.set_xlabel(r'time / $\mathrm{ns}$')

ax.set_ylabel(r'Trp-1 SASA ACF / $\mathrm{nm}^4$')

ax.legend()

fig.tight_layout()

代码

文本

注意 y-轴上的刻度：考虑到实验的不确定性，这个振幅可能太小，无法在实验中测量。

然而，利用更先进的实验装置，如停止流动、T-跳跃、P-跳跃等，我们可以在非平衡初始条件下准备我们的集合。

比方说，我们可以通过实验将一个样品制备成仅处于亚稳态 $S_{1}$ 。在这种情况下，初始条件将由亚稳状态 $S_{1}$ , $p_{0}$ 的亚稳分布给出。使用 PyEMMA的 $re l a x a t i o n ()$ 方法，我们可以模拟从这种非平衡初始条件 $p_{0}$ 回到平衡状态的弛豫过程：

代码

文本

[ ]

eq_time_ml, eq_relax_ml = mlmsm.relaxation(

msm.metastable_distributions[0],

markov_average_trp_sasa,

maxtime=150)

eq_time_bayes, eq_relax_bayes = msm.sample_mean(

'relaxation',

msm.metastable_distributions[0],

np.array(markov_average_trp_sasa),

maxtime=150)

eq_relax_bayes_CI_l, eq_relax_bayes_CI_u = msm.sample_conf(

'relaxation',

msm.metastable_distributions[0],

np.array(markov_average_trp_sasa),

maxtime=150)

fig, ax = plt.subplots()

ax.plot(eq_time_ml, eq_relax_ml, '-o', color='C1', label='ML MSM')

ax.plot(

eq_time_bayes,

eq_relax_bayes,

'--x',

color='C0',

label='Bayes sample mean')

ax.fill_between(

eq_time_bayes,

eq_relax_bayes_CI_l[1],

eq_relax_bayes_CI_u[1],

facecolor='C0',

alpha=0.3)

ax.semilogx()

ax.set_xlim((eq_time_ml[1], eq_time_ml[-1]))

ax.set_xlabel(r'time / $\mathrm{ns}$')

ax.set_ylabel(r'Average Trp-1 SASA / $\mathrm{nm}^2$')

ax.legend()

fig.tight_layout()

代码

文本

这个信号比我们从上述平衡时的自相关性中得到的信号要强得多！

如果我们计算每个亚稳态的平均观测值，并将其与系综平均值进行比较，我们就能知道哪种初始分布会给我们带来最强的信号。我们将对这些值进行比较，并在下文中报告它们的绝对差异：

代码

文本

[ ]

state2ensemble = np.abs(msm.expectation(markov_average_trp_sasa) -

msm.metastable_distributions.dot(np.array(markov_average_trp_sasa)))

DataFrame(np.round(state2ensemble, 3), index=range(1, nstates + 1), columns=[''])

代码

文本

请注意，将我们的系统准备在亚态 $S_{1}$ 将给我们带来最强烈的信号，其标志是一个亚态下的观测值与全局系综平均值之间最大的绝对差值。不过，请注意，可能还有其他初始条件会给我们带来更强的信号。

利用如上方法，我们可以想象如何利用 MSM 设计实验来帮助验证和测试我们的模型。

Hidden Markov models

另一种方法是使用隐马尔可夫模型（HMM）noe-15。HMM 对所谓的隐藏状态之间的动态进行建模，我们从 PCCA++ 发现的可变状态出发。由于我们不假定集群中心空间的马尔可夫性，因此估算不易出现离散化误差。此外，它还提供了一种自然的粗粒度，可将其划分为给定数量的隐藏状态，从而产生一种方法来生成一个封闭形式的可变动态马尔可夫模型。

Assembling manuscript figures

在下文中，我们将使用本笔记本的结果绘制图表。

代码

文本

[ ]

from matplotlib.gridspec import GridSpec

from matplotlib.ticker import LogLocator

from matplotlib.cm import get_cmap

from pyemma.plots.markovtests import _add_ck_subplot

mpl.rcParams['axes.titlesize'] = 6

mpl.rcParams['axes.labelsize'] = 6

mpl.rcParams['legend.fontsize'] = 5

mpl.rcParams['xtick.labelsize'] = 5

mpl.rcParams['ytick.labelsize'] = 5

mpl.rcParams['xtick.minor.pad'] = 2

mpl.rcParams['xtick.major.pad'] = 3

mpl.rcParams['ytick.minor.pad'] = 2

mpl.rcParams['ytick.major.pad'] = 3

mpl.rcParams['axes.labelpad'] = 1

mpl.rcParams['lines.markersize'] = 4

代码

文本

This is Figure 3 (a,b,c,d) which sketches the system and coordinates part:

代码

文本

[ ]

fig = plt.figure(figsize=(3.47, 4.65))

gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))

gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))

gs = plt.GridSpec(gh, gw)

gs.update(hspace=0.0, wspace=0.0, left=0.0, right=1.0, bottom=0.0, top=1.0)

ax_box = fig.add_subplot(gs[:, :])

ax_box.set_axis_off()

ax_box.text(0.00, 0.95, '(a)', size=10, zorder=1)

ax_box.text(0.00, 0.58, '(b)', size=10)

ax_box.text(0.55, 0.58, '(c)', size=10)

ax_box.text(0.00, 0.22, '(d)', size=10)

ax_mol = fig.add_subplot(gs[:1600, -2820:-400])

ax_mol.set_axis_off()

ax_mol.imshow(plt.imread('data/pentapeptide-structure.png'))

ax_feat = fig.add_subplot(gs[2000:3150, 400:1800])

ax_feat.bar(

vamp_bars_plot['labels'],

vamp_bars_plot['scores'],

yerr=vamp_bars_plot['errors'],

color=['C0', 'C1', 'C2'])

ax_feat.set_ylabel('VAMP2 score')

ax_feat.set_title(r'lag time $\tau$ = {:.1f} ns'.format(vamp_bars_plot['lag'] * 0.1))

ax_feat.set_ylim(2.75, 3.65)

ax_feat.tick_params(axis='x', labelrotation=20)

ax_sample_free_energy = fig.add_subplot(gs[2000:3150, 2200:3350])

_, _, misc = pyemma.plots.plot_free_energy(

*tica_concatenated.T[:2],

ax=ax_sample_free_energy,

cax=fig.add_subplot(gs[1900:1950, 2200:3350]),

cbar_orientation='horizontal',

legacy=False)

misc['cbar'].set_label('sample free energy / kT')

misc['cbar'].set_ticks(np.arange(9))

misc['cbar'].ax.xaxis.set_ticks_position('top')

misc['cbar'].ax.xaxis.set_label_position('top')

ax_sample_free_energy.set_xlabel('IC 1')

ax_sample_free_energy.set_ylabel('IC 2')

x = 0.1 * np.arange(tica_output[0].shape[0])

ax_tic1 = fig.add_subplot(gs[3650:4000, 400:3350])

ax_tic2 = fig.add_subplot(gs[4000:4350, 400:3350])

ax_tic1.plot(x, tica_output[0][:, 0], linewidth=0.25)

ax_tic2.plot(x, tica_output[0][:, 1], linewidth=0.25)

ax_tic1.set_ylabel('IC 1')

ax_tic2.set_ylabel('IC 2')

ax_tic2.set_xlabel('time / ns')

fig.savefig('data/figure_3.pdf', dpi=300)

代码

文本

Next is Figure 4 (a,b) which shows estimation and validation:

代码

文本

[ ]

fig = plt.figure(figsize=(3.47, 2.60))

gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))

gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))

gs = plt.GridSpec(gh, gw)

gs.update(hspace=0.0, wspace=0.0, left=0.0, right=1.0, bottom=0.0, top=1.0)

ax_box = fig.add_subplot(gs[:, :])

ax_box.set_axis_off()

ax_box.text(0.00, 0.95, '(a)', size=10)

ax_box.text(0.00, 0.36, '(b)', size=10)

ax_its = fig.add_subplot(gs[50:1300, 400:3350])

pyemma.plots.plot_implied_timescales(its, units='ns', dt=0.1, ax=ax_its, nits=4, ylog=True)

ax_its.set_ylim(1, ax_its.get_ylim()[1])

ax_its.set_xlabel(r'lag time $\tau$ / ns')

ax_ck = [

fig.add_subplot(gs[1730:2300, 400:970]),

fig.add_subplot(gs[1730:2300, 995:1565]),

fig.add_subplot(gs[1730:2300, 1590:2160]),

fig.add_subplot(gs[1730:2300, 2185:2755]),

fig.add_subplot(gs[1730:2300, 2780:3350])]

for k, ax in enumerate(ax_ck):

lest, lpred = _add_ck_subplot(

cktest, ax, k, k, ipos=cktest.nsets - 1, dt=0.1, units='ns', linewidth=0.7)

if k > 0:

ax.set_yticks([])

predlabel = 'predict ({:3.1f}%)'.format(100.0 * cktest.conf)

estlabel = 'estimate'

ax_ck[-1].legend(

(lest[0], lpred[0]),

(estlabel, predlabel),

frameon=False,

loc='upper left',

bbox_to_anchor=(-0.4, 1.45))

fig.savefig('data/figure_4.pdf', dpi=300)

代码

文本

Figure 5 (a,b,c,d) highlights the basic analysis part using map plots:

代码

文本

[ ]

fig = plt.figure(figsize=(3.47, 3.95))

gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))

gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))

gs = plt.GridSpec(gh, gw)

gs.update(hspace=0.0, wspace=0.0, left=0.0, right=1.0, bottom=0.0, top=1.0)

ax_box = fig.add_subplot(gs[:, :])

ax_box.set_axis_off()

ax_box.text(0.03, 0.97, '(a)', size=10)

ax_box.text(0.53, 0.97, '(b)', size=10)

ax_box.text(0.03, 0.50, '(c)', size=10)

ax_box.text(0.53, 0.50, '(d)', size=10)

ax_fe = fig.add_subplot(gs[400:1750, 400:1750])

_, _, misc = pyemma.plots.plot_free_energy(

*tica_concatenated.T[:2],

weights=np.concatenate(msm.trajectory_weights()),

ax=ax_fe,

cax=fig.add_subplot(gs[300:350, 400:1750]),

cbar_orientation='horizontal',

legacy=False)

misc['cbar'].set_ticks(np.linspace(0, 8, 5))

misc['cbar'].ax.xaxis.set_ticks_position('top')

misc['cbar'].ax.xaxis.set_label_position('top')

misc['cbar'].set_label(r'free energy / $\mathrm{k}_\mathrm{B}T$')

ax_fe.set_ylabel('IC 2')

ax_fe.set_xticklabels([])

ax_state = fig.add_subplot(gs[400:1750, 2000:3350])

_, _, misc = pyemma.plots.plot_state_map(

*tica_concatenated.T[:2],

metastable_traj,

ax=ax_state,

cax=fig.add_subplot(gs[300:350, 2000:3350]),

cbar_label='metastable state',

cbar_orientation='horizontal')

misc['cbar'].ax.xaxis.set_ticks_position('top')

misc['cbar'].ax.xaxis.set_label_position('top')

misc['cbar'].set_ticklabels([r'$\mathcal{S}_%d$' % (i + 1)

for i in range(nstates)])

ax_state.set_xticklabels([])

ax_state.set_yticklabels([])

ax_state.text(0.70, 6.30, '$\mathcal{S}_1$', size=10)

ax_state.text(3.00, 4.00, '$\mathcal{S}_2$', size=10)

ax_state.text(2.50, 2.00, '$\mathcal{S}_3$', size=10)

ax_state.text(5.20, -0.50, '$\mathcal{S}_4$', size=10)

ax_state.text(-0.20, 0.00, '$\mathcal{S}_5$', size=10)

evec_idx = 1

ax_eig = fig.add_subplot(gs[2300:3650, 400:1750])

_, _, misc = pyemma.plots.plot_contour(

*tica_concatenated.T[:2],

eigvec[dtrajs_concatenated, evec_idx],

cmap='PiYG',

ax=ax_eig,

mask=True,

cax=fig.add_subplot(gs[2200:2250, 400:1750]),

cbar_label='{}. right eigenvector'.format(evec_idx + 1),

cbar_orientation='horizontal')

misc['cbar'].set_ticks(np.linspace(*misc['cbar'].get_clim(), 3))

misc['cbar'].ax.xaxis.set_ticks_position('top')

misc['cbar'].ax.xaxis.set_label_position('top')

ax_eig.set_xlabel('IC 1')

ax_eig.set_ylabel('IC 2')

ax_flux = fig.add_subplot(gs[2300:3650, 2000:3350])

_, _, misc = pyemma.plots.plot_contour(

*tica_concatenated.T[:2],

flux.committor[dtrajs_concatenated],

cmap='brg',

ax=ax_flux,

mask=True,

cax=fig.add_subplot(gs[2200:2250, 2000:3350]),

cbar_label=r'committor $\mathcal{S}_%d \to \mathcal{S}_%d$' % (

start + 1, final + 1),

cbar_orientation='horizontal')

misc['cbar'].set_ticks(np.linspace(0, 1, 3))

misc['cbar'].set_ticklabels(['start', 'transition state', 'final'])

misc['cbar'].ax.xaxis.set_ticks_position('top')

misc['cbar'].ax.xaxis.set_label_position('top')

ax_flux.set_xlabel('IC 1')

ax_flux.set_yticklabels([])

fig.savefig('data/figure_5.pdf', dpi=300)

代码

文本

Figure 6 visualizes the $S_{2}$ → $S_{4}$ committor:

代码

文本

[ ]

fig = plt.figure(figsize=(3.47, 1.804))

gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))

gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))

gs = plt.GridSpec(gh, gw)

gs.update(hspace=0.0, wspace=0.0, left=0.0, right=1.0, bottom=0.0, top=1.0)

ax_box = fig.add_subplot(gs[:, :])

ax_box.set_axis_off()

posx = cgflux.forward_committor

posy = 0.1 * np.array([4.5, 2.3, 2.3, 7, 4.5])

pos = np.vstack([posx, posy]).T

sizes = [[385, 432], [388, 526], [347, 500], [367, 348], [260, 374]]

anchors = [[1120, 950], [220, 500], [1630, 830], [2800, 600], [1600, 50]]

for i, ((w, h), (x, y)) in enumerate(zip(sizes, anchors)):

ax_ = fig.add_subplot(gs[y:y+h, x:x+w])

ax_.set_axis_off()

ax_.text(0, 500, r'$\mathcal{S}_' + '{}$'.format(i + 1))

ax_.imshow(plt.imread('static/hmm-backbone-{}-{}x{}.png'.format(i + 1, w, h)))

flux_ax = fig.add_subplot(gs[:1500, :])

pyemma.plots.plot_flux(

cgflux,

pos=pos,

ax=flux_ax,

state_sizes=np.array([1.5 for _ in range(len(cgflux.stationary_distribution))]),

state_colors='None',

max_width=15,

max_height=15,

minflux=2e-5,

arrow_scale=1.,

size=5,

state_labels=None,

show_committor=True)

flux_ax.set_xticks(np.arange(0, 1.2, .2))

fig.savefig('data/figure_6.pdf', dpi=300)

代码

文本

And, finally, Figure 7 (a,b) depicts the Trp-1 autocorrelation:

代码

文本

[ ]

fig = plt.figure(figsize=(3.47, 2.5))

gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))

gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))

gs = plt.GridSpec(gh, gw)

gs.update(hspace=0.0, wspace=0.0, left=0.0, right=1.0, bottom=0.0, top=1.0)

ax_box = fig.add_subplot(gs[:, :])

ax_box.set_axis_off()

ax_box.text(0.01, 0.95, '(a)', size=10)

ax_box.text(0.01, 0.50, '(b)', size=10)

ax_acf = fig.add_subplot(gs[50:1075, 500:3400])

ax_acf.plot(eq_time_ml, eq_acf_ml, '-', color='C1', label='ML MSM')

ax_acf.plot(

eq_time_bayes,

eq_acf_bayes,

'--',

color='C0',

label='Bayesian MSM')

ax_acf.fill_between(

eq_time_bayes,

eq_acf_bayes_ci_l[1],

eq_acf_bayes_ci_u[1],

facecolor='C0',

alpha=0.3)

ax_acf.semilogx()

ax_acf.set_xlim((eq_time_ml[1], eq_time_ml[-1]))

ax_acf.set_xticks([])

ax_acf.set_ylabel(r'ACF / nm$^4$')

ax_acf.legend()

ax_rlx = fig.add_subplot(gs[1125:2150, 500:3400])

ax_rlx.plot(eq_time_ml, eq_relax_ml, '-', color='C1', label='ML MSM')

ax_rlx.plot(

eq_time_bayes,

eq_relax_bayes,

'--',

color='C0',

label='Bayesian MSM')

ax_rlx.fill_between(

eq_time_bayes,

eq_relax_bayes_CI_l[1],

eq_relax_bayes_CI_u[1],

facecolor='C0',

alpha=0.3)

ax_rlx.semilogx()

ax_rlx.set_ylabel(r'Average / nm$^2$')

ax_rlx.set_xlim((eq_time_ml[1], eq_time_ml[-1]))

ax_rlx.set_xlabel(r'time / ns')

fig.savefig('data/figure_7.pdf', dpi=300)

代码

文本

后续

从多肽到蛋白，构建更大的体系的MSM模型并进行分析...

代码

文本

中文

生物信息学

Molecular Dynamics

MSM

中文生物信息学Molecular DynamicsMSM

已赞2

本文被以下合集收录

good notebook

15521120143@163.com

更新于 2024-01-11

7 篇0 人关注

Molecular dynamics

bohrf01c48

更新于 2023-09-22

2 篇0 人关注