Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
OpenMM Tutorial 1:Ligand-Protein Complex Parameterization and Simulaition
Molecular Dynamics
English
notebook
protein
Molecular DynamicsEnglishnotebookprotein
Yanze Wang
发布于 2023-09-23
推荐镜像 :Basic Image:ubuntu18.04-py3.10-mamba-cuda10.2
推荐机型 :c12_m46_1 * NVIDIA GPU B
rial(v1)

OpenMM Tutorial 1:Ligand-Protein Complex Parameterization and Simulation

代码
文本

Introduction

Still no idea how to make real molecular dynamics (aka MD) codes run? Still struggle on analysis of MD data?

No more confusion! Let's kick off simulations here!

This is a part of a MD tutorial series which walks you through how MD simulations work and how you can make it in your real research. I will try my best cover all the details involved and make simulations ready-to-publish level.

Now, enjoy!

But hold on, before everything starts, let's move steps to some samll questions:

代码
文本

Why OpenMM

There are many highly efficient molecular dynamics softwares so far, like Gromacs, Amber, OpenMM, LAMMPS, NAMD etc. They have different features and different user community respectively. They also perform differently. There is a benchmark of these tools https://mdbench.ace-net.ca/mdbench/

OK, you may have already noticed that OpenMM is not the fastest tool. That's true, but frankly, OpenMM is the easiest and the most flexible tool for you to understand as a beginner about every details through MD simulation. Most importantly, it's easy to install!

代码
文本

Environment Setup

To install openmm and some auxiliary analysis tools, use the command below:

  • openmm: the OpenMM package, the latest version by default;
  • ambertools: AmberTools developed by Amber to build molecular model and topology;
  • MDAnalysis: Python package to faciliate MD analysis. Alternatively, you may use mdtraj
代码
文本
[26]
!mamba install -c conda-forge openmm MDAnalysis ambertools -y > log
代码
文本
[ ]

代码
文本

Example System

Here we use T4 lysozyme L99A/M102Q as model system to develop this tutorial.

Download files through:

代码
文本
[27]
%%bash
if ! [ -e 3htb.pdb ]; then # the protein
cp /bohr/T4-jz4-complex-nwwe/v1/3htb.pdb ./
elif [ -d DeePMD-kit_Tutorial ]; then
echo "file 3htb.pdb exists "
fi
if ! [ -e jz4.sdf ]; then # the ligand
cp /bohr/T4-jz4-complex-nwwe/v1/jz4.sdf ./
elif [ -d DeePMD-kit_Tutorial ]; then
echo "file jz4.sdf exists"
fi
代码
文本
[28]
!ls -alhrt
total 7.5M
-rw-r--r-- 1 4294967294 4294967294  85K Dec  4  2022 DMFF_benzene_rdf_optimization.ipynb
-rw-r--r-- 1 4294967294 4294967294    0 Feb 16  2023 lg.ipynb
drwxr-xr-x 2 4294967294 4294967294 4.0K Jun  2 20:33 .ipynb_checkpoints
drwxr-xr-x 4 4294967294 4294967294 4.0K Jun  2 21:05 data
-rw-r--r-- 1 4294967294 4294967294 237K Jun  2 21:08 fes_converge.ipynb
drwxr-xr-x 3 4294967294 4294967294 4.0K Jun 28 09:28 prot
-rwxr-xr-x 1 4294967294 4294967294 141K Sep 23 11:43 3htb.pdb
-rwxr-xr-x 1 4294967294 4294967294  34K Sep 23 11:43 jz4.sdf
-rw-r--r-- 1 4294967294 4294967294 1.3K Sep 23 11:44 sqm.in
-rw-r--r-- 1 4294967294 4294967294 7.8K Sep 23 11:44 sqm.out
-rw-r--r-- 1 4294967294 4294967294 1.7K Sep 23 11:44 sqm.pdb
-rw-r--r-- 1 4294967294 4294967294 2.5K Sep 23 11:44 jz4.gaff2.mol2
-rw-r--r-- 1 4294967294 4294967294 7.0K Sep 23 11:44 ANTECHAMBER.FRCMOD
-rw-r--r-- 1 4294967294 4294967294 1.9K Sep 23 11:44 jz4.gaff2.frcmod
-rw-r--r-- 1 4294967294 4294967294 4.0K Sep 23 11:44 stdout_renum.txt
-rw-r--r-- 1 4294967294 4294967294    0 Sep 23 11:44 stdout_nonprot.pdb
-rw-r--r-- 1 4294967294 4294967294    0 Sep 23 11:44 stdout_sslink
-rw-r--r-- 1 4294967294 4294967294 105K Sep 23 11:44 3htb.amber.pdb
-rw-r--r-- 1 4294967294 4294967294  880 Sep 23 11:45 tleap.in
-rw-r--r-- 1 4294967294 4294967294 1.2M Sep 23 11:46 complex.inpcrd
-rw-r--r-- 1 4294967294 4294967294 5.7M Sep 23 11:46 complex.prmtop
-rw-r--r-- 1 4294967294 4294967294  17K Sep 23 11:46 leap.log
drwxr-xr-x 1 root       root       4.0K Sep 23 11:51 ..
-rw-r--r-- 1 4294967294 4294967294  293 Sep 23 11:58 conda.log
d--x--x--x 5 root       root       4.0K Sep 23 12:02 .
-rw-r--r-- 1 4294967294 4294967294  171 Sep 23 12:53 log
代码
文本

Parameterization

Prepare for small molecule ligands

We generally store lignds in a .sdf or .mol2 file format. Do NOT use PDB format for small molecules.
If you download a protein-ligand complex PDB file from PDBBank, you can load your it by PyMol and export them to .sdf format.

Before exporting your ligand, use PyMol to add hydrogens to get a complete structure. Do NOT add hydrogens to protein at this time, or you may get conflicts later on.

As we already prepared them for you, we will move on with the files we downloaded from the dataset.

代码
文本

Parameterization of Ligand

In this case, we choose GAFF (General AMBER Force Field) for parameterization. There are GAFF and GAFF2 available now, but they are basically the same for our purpose, you can choose either one. However, GAFF2 is newly revised and more popular now.

We will use ambertools to parameterize our small molecules here.

antechamber[http://ambermd.org/antechamber/example.html] helps us calculate the partial charges on each atom. This may take several minutes since every partial charge will be calculated by semi-empirical QM methods (AM1-BCC, -c bcc flag). specify -nc 0 since the ligand is neutral.

parmchk2 can assign parameters to the ligands according to the atom types. You can get the parameter file jz4.gaff2.frcmod

代码
文本
[29]
# antechamber helps you calculate the charges of
!antechamber -i jz4.sdf -fi sdf -o jz4.gaff2.mol2 -fo mol2 -rn JZ4 -at gaff2 -an yes -dr no -pf yes -c bcc -nc 0
!parmchk2 -i jz4.gaff2.mol2 -f mol2 -o jz4.gaff2.frcmod -s gaff -a yes
Welcome to antechamber 22.0: molecular input file processor.

Info: The atom type is set to gaff2; the options available to the -at flag are
      gaff, gaff2, amber, bcc, and sybyl.

Info: Total number of electrons: 74; net charge: 0

Running: /opt/mamba/bin/sqm -O -i sqm.in -o sqm.out

代码
文本

If you get an error, please make sure you have added hydrogens to the ligand in the previous step. Otherwise, the net charge is incorrect.

代码
文本
[ ]

代码
文本

Parameterization of Protein and Complex

Prepare the Amber-readable Protein PDB File.

To get a ready-to-simulation protein pdb files, we need to modify a bit the PDB files from PDBBank. Here are some general tips:

  • remove all crystal waters
  • remove unnecessary cofactors (ligands, ions, etc)
  • add missing residues
  • correct residue names

As the file downloaded is already processed, we don't need to do it here.

Before, pass it into ambertools, we need to convert it to amber-readable PDB. Basically this step will correct some residue names and add some missing atoms.

代码
文本
[30]
!pdb4amber 3htb.pdb > 3htb.amber.pdb
==================================================
Summary of pdb4amber for: 3htb.pdb
===================================================

----------Chains
The following (original) chains have been found:
A

---------- Alternate Locations (Original Residues!))

The following residues had alternate locations:
MET_1
THR_21
TYR_24
ASN_68
ASP_72
ARG_80
ARG_96
GLN_102
LYS_147
ARG_154
-----------Non-standard-resnames


---------- Missing heavy atom(s)

None
The alternate coordinates have been discarded.
Only the first occurrence for each atom was kept.
代码
文本

Prepare topology by tLEaP

LEaP is a powerful modeling tool in Ambertool. xLEaP enables you run scripts in a graphical window. If only terminal line is available, use tLEaP instead.

Now let’s prepare the input script of tLEaP. Use the text below and name the file as 3htb.tleap.in:

代码
文本
[31]
tleap_in = """
source leaprc.protein.ff14SB #Source leaprc file for ff14SB protein force field
source leaprc.gaff2 #Source leaprc file for gaff
source leaprc.water.tip3p #Source leaprc file for TIP3P water model
loadamberparams jz4.gaff2.frcmod #Load the additional frcmod file for ligand
lig = loadmol2 jz4.gaff2.mol2
mol = loadpdb 3htb.amber.pdb #Load PDB file for protein-ligand complex
complex = combine {lig mol} # Combine two parts
solvatebox complex TIP3PBOX 12 #Solvate the complex with a cubic water box
addionsrand complex Na+ 0 #Add Na+ ions to neutralize the system
addionsrand complex Cl- 0 #Add Cl- ions to neutralize the system
addionsrand complex Na+ 10 #Add Na+ ions to increase ion strength
addionsrand complex Cl- 10 #Add Cl- ions to increase ion strength
saveamberparm complex complex.prmtop complex.inpcrd #Save AMBER topology and coordinate files
quit #Quit tleap program
"""
with open("tleap.in", "w") as fp:
fp.write(tleap_in)
代码
文本

Here we get the topology file complex.prmtop and the coordinate file complex.inpcrd.

The topology file contains the connectivity information, charges, force field parameters, etc;

The coordinate file contains the Cartesian coordinates of each atom.

代码
文本
[32]
# run
!tleap -f tleap.in
-I: Adding /opt/mamba/dat/leap/prep to search path.
-I: Adding /opt/mamba/dat/leap/lib to search path.
-I: Adding /opt/mamba/dat/leap/parm to search path.
-I: Adding /opt/mamba/dat/leap/cmd to search path.
-f: Source tleap.in.

Welcome to LEaP!
(no leaprc in search path)
Sourcing: ./tleap.in
----- Source: /opt/mamba/dat/leap/cmd/leaprc.protein.ff14SB
----- Source of /opt/mamba/dat/leap/cmd/leaprc.protein.ff14SB done
Log file: ./leap.log
Loading parameters: /opt/mamba/dat/leap/parm/parm10.dat
Reading title:
PARM99 + frcmod.ff99SB + frcmod.parmbsc0 + OL3 for RNA
Loading parameters: /opt/mamba/dat/leap/parm/frcmod.ff14SB
Reading force field modification type file (frcmod)
Reading title:
ff14SB protein backbone and sidechain parameters
Loading library: /opt/mamba/dat/leap/lib/amino12.lib
Loading library: /opt/mamba/dat/leap/lib/aminoct12.lib
Loading library: /opt/mamba/dat/leap/lib/aminont12.lib
----- Source: /opt/mamba/dat/leap/cmd/leaprc.gaff2
----- Source of /opt/mamba/dat/leap/cmd/leaprc.gaff2 done
Log file: ./leap.log
Loading parameters: /opt/mamba/dat/leap/parm/gaff2.dat
Reading title:
AMBER General Force Field for organic molecules (Version 2.2.20, March 2021)
----- Source: /opt/mamba/dat/leap/cmd/leaprc.water.tip3p
----- Source of /opt/mamba/dat/leap/cmd/leaprc.water.tip3p done
Loading library: /opt/mamba/dat/leap/lib/atomic_ions.lib
Loading library: /opt/mamba/dat/leap/lib/solvents.lib
Loading parameters: /opt/mamba/dat/leap/parm/frcmod.tip3p
Reading force field modification type file (frcmod)
Reading title:
This is the additional/replacement parameter set for TIP3P water
Loading parameters: /opt/mamba/dat/leap/parm/frcmod.ions1lm_126_tip3p
Reading force field modification type file (frcmod)
Reading title:
Li/Merz ion parameters of monovalent ions for TIP3P water model (12-6 normal usage set)
Loading parameters: /opt/mamba/dat/leap/parm/frcmod.ionsjc_tip3p
Reading force field modification type file (frcmod)
Reading title:
Monovalent ion parameters for Ewald and TIP3P water from Joung & Cheatham JPCB (2008)
Loading parameters: /opt/mamba/dat/leap/parm/frcmod.ions234lm_126_tip3p
Reading force field modification type file (frcmod)
Reading title:
Li/Merz ion parameters of divalent to tetravalent ions for TIP3P water model (12-6 normal usage set)
Loading parameters: ./jz4.gaff2.frcmod
Reading force field modification type file (frcmod)
Reading title:
Remark line goes here
Loading Mol2 file: ./jz4.gaff2.mol2
Reading MOLECULE named JZ4
Loading PDB file: ./3htb.amber.pdb
  Added missing heavy atom: .R<CASN 163>.A<OXT 15>
  total atoms in file: 1300
  Leap added 1314 missing atoms according to residue templates:
       1 Heavy
       1313 H / lone pairs
  Solute vdw bounding box:              43.434 44.621 52.972
  Total bounding box for atom centers:  67.434 68.621 76.972
  Solvent unit box:                     18.774 18.774 18.774
  Total vdw box size:                   70.504 71.808 79.796 angstroms.
  Volume: 403987.403 A^3 
  Total mass 198988.730 amu,  Density 0.818 g/cc
  Added 10010 residues.

/opt/mamba/bin/teLeap: Warning!
addIonsRand: 1st Ion & target unit have charges of the same sign:
     unit charge = 6; ion1 charge = 1;
     can't neutralize.
6 Cl- ions required to neutralize.
Adding 6 counter ions to "complex". 10004 solvent molecules will remain.
0: Placed Cl- in complex at (19.40, 8.70, 10.53).
0: Placed Cl- in complex at (10.00, -22.19, -26.97).
0: Placed Cl- in complex at (31.42, -18.51, -34.60).
0: Placed Cl- in complex at (-2.85, 0.66, -24.86).
0: Placed Cl- in complex at (-17.07, -15.26, 21.14).
0: Placed Cl- in complex at (0.91, 27.90, 35.47).
Adding 10 counter ions to "complex". 9994 solvent molecules will remain.
0: Placed Na+ in complex at (-7.96, 8.61, 24.31).
0: Placed Na+ in complex at (-5.24, 17.69, 31.78).
0: Placed Na+ in complex at (-29.58, -32.79, -37.63).
0: Placed Na+ in complex at (24.50, -14.55, 11.93).
0: Placed Na+ in complex at (-9.80, 14.45, -10.58).
0: Placed Na+ in complex at (24.09, -17.04, -19.71).
0: Placed Na+ in complex at (1.11, 17.65, 34.45).
0: Placed Na+ in complex at (4.48, -23.14, -6.77).
0: Placed Na+ in complex at (-17.93, -27.81, 7.55).
0: Placed Na+ in complex at (-15.41, -4.75, -11.15).
Adding 10 counter ions to "complex". 9984 solvent molecules will remain.
0: Placed Cl- in complex at (20.51, 32.03, 35.45).
0: Placed Cl- in complex at (-9.59, 18.42, -13.25).
0: Placed Cl- in complex at (-33.44, 20.78, 11.72).
0: Placed Cl- in complex at (-9.17, -14.18, 23.44).
0: Placed Cl- in complex at (9.51, -24.63, 37.59).
0: Placed Cl- in complex at (27.21, 20.13, -24.83).
0: Placed Cl- in complex at (-16.46, -12.76, -17.80).
0: Placed Cl- in complex at (-31.11, 16.55, 0.48).
0: Placed Cl- in complex at (12.65, -18.51, 21.72).
0: Placed Cl- in complex at (31.17, -32.78, -30.53).
Checking Unit.
Building topology.
Building atom parameters.
Building bond parameters.
Building angle parameters.
Building proper torsion parameters.
Building improper torsion parameters.
 total 534 improper torsions applied
Building H-Bond parameters.
Incorporating Non-Bonded adjustments.
Not Marking per-residue atom chain types.
Marking per-residue atom chain types.
  (Residues lacking connect0/connect1 - 
   these don't have chain types marked:

	res	total affected

	CASN	1
	JZ4	1
	NMET	1
	WAT	9984
  )
 (no restraints)
	Quit

Exiting LEaP: Errors = 0; Warnings = 1; Notes = 0.
代码
文本

Let's Simulate!

代码
文本

You can refer to [http://docs.openmm.org/7.6.0/userguide/application/02_running_sims.html] OpenMM documents for more details. But here we will go beyond that.

代码
文本

For a standard MD simultion, we need the following steps:

  1. Energy Minimization

    the structures we built above is way far from the stable equilbrium states. The water molecules are randomly placed and some atoms have bad contacts or overlaps with each other. If we start directly from this file, we can end up with numerical explosion. (Typically nan energy, velocities and coordinates)

  2. NVT Equilibrium (heating)

    We don't have the velocity information right now. One way is to assign velocities to each atom according to Boltzman distribution. However, this can give a super large energy state of the system which could fail simulations. A safer way is to increase the temperature gradually by coupling the system to a heat bath and monitor the temperature till equilibrium. This algorithm can be achieved by Langebin dynamics or other heat bath.

  3. NPT Equilibrium (pressurizing)

    NPT is a more realistic ensemble to real situations. To this end, we need to couple the system to a barostat. Again, we need a couple of picoseconds or nanoseconds to allow the systems reach a equilibrium. The pressure of micro particle system is calculated by Virial tensor. However, pressure is actually a macro property of a huge bunch of particles that can vary drastically in micro systems. Therefore, instead of monitoring pressure, we will monitor the density as a proxy.

  4. Production Run under NPT

    Finally, after all equilibrium steps, we will propagate our system for a fairly long time scale to sample the desired phase space extensively.

代码
文本
[33]
import openmm as mm
import openmm.app as app
import openmm.unit as unit
from sys import stdout
代码
文本

Now let's load the files we have prepared.

代码
文本
[34]
# load AMBER format files
prmtop = app.AmberPrmtopFile('complex.prmtop')
inpcrd = app.AmberInpcrdFile('complex.inpcrd')

代码
文本

Then create a system based the topology. We will allow periodic boundary condition by specify nonbondedMethod=PME.

PME stands for Particle Mesh Ewald summantion methods. It uses fast fourior transormation to calculate slowly converged long-range interactions rapidly.

We need to choose an integrator to propagate the system. Instead of standard VV algorithm, we choose LagevinMiddleIntegrator to couple the system to heat bath in the integrator level.

代码
文本
[35]
system = prmtop.createSystem(nonbondedMethod=app.PME, nonbondedCutoff=1*unit.nanometer,
constraints=app.HBonds)
integrator = mm.LangevinMiddleIntegrator(300*unit.kelvin, 1/unit.picosecond, 0.002*unit.picoseconds)

# create simulation instance.
simulation = app.Simulation(prmtop.topology, system, integrator)
simulation.context.setPositions(inpcrd.positions)
if inpcrd.boxVectors is not None:
simulation.context.setPeriodicBoxVectors(*inpcrd.boxVectors)

代码
文本

Energy minimization

代码
文本
[36]
simulation.minimizeEnergy()
代码
文本

NVT Equilibrium

代码
文本
[38]
#Warm up systems. Slowly warm up temperature - every 1000 steps raise the temperature by 5 K, ending at 300 K
simulation.reporters.append(
app.StateDataReporter(
stdout,
5000,
step=True,
potentialEnergy=True,
temperature=True,
volume=True,
density=True
)
)
simulation.context.setVelocitiesToTemperature(5*unit.kelvin)
print('Warming up the system...')
T = 5
mdsteps = 50000
stages = 60
steps_per_stage = int(mdsteps//stages)
for i in range(60):
temperature = (T+(i*T))*unit.kelvin
if i%20 ==0:
print(f"heating to {temperature}")
integrator.setTemperature(temperature)
simulation.step(steps_per_stage)
# give some extra steps to maintain temperature equilibrium
print(f"Now it's around 300K")
simulation.step(5001)
Warming up the system...
heating to 5 K
#"Step","Potential Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)","Density (g/mL)"
5000,-528112.6655548871,24.698794077986207,403.9874031868558,0.8192689923107143
10000,-522999.79055488715,54.574706776363364,403.9874031868558,0.8192689923107143
15000,-517545.66555488715,83.88603284208452,403.9874031868558,0.8192689923107143
heating to 105 K
20000,-511602.29055488715,114.47304457150372,403.9874031868558,0.8192689923107143
25000,-503905.91555488715,144.8499855913975,403.9874031868558,0.8192689923107143
30000,-495757.04055488715,174.43934329962534,403.9874031868558,0.8192689923107143
heating to 205 K
35000,-483894.91555488715,202.3606573622087,403.9874031868558,0.8192689923107143
40000,-466474.41555488715,232.6962741209631,403.9874031868558,0.8192689923107143
45000,-447218.66555488715,263.0941494911721,403.9874031868558,0.8192689923107143
50000,-430301.41555488715,292.96518588481234,403.9874031868558,0.8192689923107143
代码
文本

Now you can see we heat the systems to 300K.

Under NVT ensemble, you can easily see that the box volumn remains constant.

代码
文本

NPT Equilibrium

代码
文本
[ ]
#NPT equilibration, reducing backbone constraints
mdsteps = 50000
barostat = system.addForce(mm.MonteCarloBarostat(1*unit.atmosphere, 300*unit.kelvin))
simulation.context.reinitialize(True)
print('Running NPT equilibration...')
simulation.step(mdsteps)
代码
文本

Now you can see we compress the systems around a constant density.

Under NPT ensemble, the box, temperature and density fluctuate around a specific value during simulation. It's not as strict as the statistical therodynamics requirement due to the numerical issues.

代码
文本

Production Run

代码
文本
[ ]
simulation.reporters[-1] = app.StateDataReporter(
stdout,
5000,
step=True,
potentialEnergy=True,
temperature=True,
volume=True,
density=True
)
simulation.reporters.append(
app.DCDReporter(('10ns_traj.dcd'), 5000))
mdsteps = 50000
print('Running Production...')
simulation.step(mdsteps)
print('Done!')
代码
文本
Molecular Dynamics
English
notebook
protein
Molecular DynamicsEnglishnotebookprotein
点个赞吧
本文被以下合集收录
MD
bohr61096f
更新于 2024-08-27
47 篇0 人关注
Reading
m1self
更新于 2024-07-26
5 篇0 人关注
推荐阅读
公开
Getting started with DMFF: A comprehensive beginner's tutorial
DMFFEnglishMD Tutorial
DMFFEnglishMD Tutorial
Feng Wei
发布于 2023-09-27
1 转存文件
公开
From DFT to MD: A Comprehensive 「Deep Potential」 Guide to Getting Started with Materials Computation
notebook
notebook
Letian
更新于 2024-07-10
1 转存文件
评论
 !mamba install -c co...

Hui_Zhou

09-26 00:28
安装过程比较久,是否考虑制作一个镜像来避免每次运行时的环境重复安装呢

Bohrium小助手

09-26 09:33
原则上我们有openmm,mdanalysis,ambertools的镜像但没有一起的?可以搞个大MD镜像
评论