OpenMM Tutorial 1:Ligand-Protein Complex Parameterization and Simulation
Introduction
Still no idea how to make real molecular dynamics (aka MD) codes run? Still struggle on analysis of MD data?
No more confusion! Let's kick off simulations here!
This is a part of a MD tutorial series which walks you through how MD simulations work and how you can make it in your real research. I will try my best cover all the details involved and make simulations ready-to-publish level.
Now, enjoy!
But hold on, before everything starts, let's move steps to some samll questions:
Why OpenMM
There are many highly efficient molecular dynamics softwares so far, like Gromacs, Amber, OpenMM, LAMMPS, NAMD etc. They have different features and different user community respectively. They also perform differently. There is a benchmark of these tools https://mdbench.ace-net.ca/mdbench/
OK, you may have already noticed that OpenMM is not the fastest tool. That's true, but frankly, OpenMM is the easiest and the most flexible tool for you to understand as a beginner about every details through MD simulation. Most importantly, it's easy to install!
Environment Setup
To install openmm and some auxiliary analysis tools, use the command below:
openmm
: the OpenMM package, the latest version by default;ambertools
: AmberTools developed by Amber to build molecular model and topology;MDAnalysis
: Python package to faciliate MD analysis. Alternatively, you may usemdtraj
Example System
Here we use T4 lysozyme L99A/M102Q as model system to develop this tutorial.
Download files through:
total 7.5M -rw-r--r-- 1 4294967294 4294967294 85K Dec 4 2022 DMFF_benzene_rdf_optimization.ipynb -rw-r--r-- 1 4294967294 4294967294 0 Feb 16 2023 lg.ipynb drwxr-xr-x 2 4294967294 4294967294 4.0K Jun 2 20:33 .ipynb_checkpoints drwxr-xr-x 4 4294967294 4294967294 4.0K Jun 2 21:05 data -rw-r--r-- 1 4294967294 4294967294 237K Jun 2 21:08 fes_converge.ipynb drwxr-xr-x 3 4294967294 4294967294 4.0K Jun 28 09:28 prot -rwxr-xr-x 1 4294967294 4294967294 141K Sep 23 11:43 3htb.pdb -rwxr-xr-x 1 4294967294 4294967294 34K Sep 23 11:43 jz4.sdf -rw-r--r-- 1 4294967294 4294967294 1.3K Sep 23 11:44 sqm.in -rw-r--r-- 1 4294967294 4294967294 7.8K Sep 23 11:44 sqm.out -rw-r--r-- 1 4294967294 4294967294 1.7K Sep 23 11:44 sqm.pdb -rw-r--r-- 1 4294967294 4294967294 2.5K Sep 23 11:44 jz4.gaff2.mol2 -rw-r--r-- 1 4294967294 4294967294 7.0K Sep 23 11:44 ANTECHAMBER.FRCMOD -rw-r--r-- 1 4294967294 4294967294 1.9K Sep 23 11:44 jz4.gaff2.frcmod -rw-r--r-- 1 4294967294 4294967294 4.0K Sep 23 11:44 stdout_renum.txt -rw-r--r-- 1 4294967294 4294967294 0 Sep 23 11:44 stdout_nonprot.pdb -rw-r--r-- 1 4294967294 4294967294 0 Sep 23 11:44 stdout_sslink -rw-r--r-- 1 4294967294 4294967294 105K Sep 23 11:44 3htb.amber.pdb -rw-r--r-- 1 4294967294 4294967294 880 Sep 23 11:45 tleap.in -rw-r--r-- 1 4294967294 4294967294 1.2M Sep 23 11:46 complex.inpcrd -rw-r--r-- 1 4294967294 4294967294 5.7M Sep 23 11:46 complex.prmtop -rw-r--r-- 1 4294967294 4294967294 17K Sep 23 11:46 leap.log drwxr-xr-x 1 root root 4.0K Sep 23 11:51 .. -rw-r--r-- 1 4294967294 4294967294 293 Sep 23 11:58 conda.log d--x--x--x 5 root root 4.0K Sep 23 12:02 . -rw-r--r-- 1 4294967294 4294967294 171 Sep 23 12:53 log
Parameterization
Prepare for small molecule ligands
We generally store lignds in a .sdf
or .mol2
file format. Do NOT use PDB format for small molecules.
If you download a protein-ligand complex PDB file from PDBBank, you can load your it by PyMol and export them to .sdf
format.
Before exporting your ligand, use PyMol to add hydrogens to get a complete structure. Do NOT add hydrogens to protein at this time, or you may get conflicts later on.
As we already prepared them for you, we will move on with the files we downloaded from the dataset.
Parameterization of Ligand
In this case, we choose GAFF (General AMBER Force Field) for parameterization. There are GAFF and GAFF2 available now, but they are basically the same for our purpose, you can choose either one. However, GAFF2 is newly revised and more popular now.
We will use ambertools
to parameterize our small molecules here.
antechamber
[http://ambermd.org/antechamber/example.html] helps us calculate the partial charges on each atom. This may take several minutes since every partial charge will be calculated by semi-empirical QM methods (AM1-BCC, -c bcc
flag). specify -nc 0
since the ligand is neutral.
parmchk2
can assign parameters to the ligands according to the atom types. You can get the parameter file jz4.gaff2.frcmod
Welcome to antechamber 22.0: molecular input file processor. Info: The atom type is set to gaff2; the options available to the -at flag are gaff, gaff2, amber, bcc, and sybyl. Info: Total number of electrons: 74; net charge: 0 Running: /opt/mamba/bin/sqm -O -i sqm.in -o sqm.out
If you get an error, please make sure you have added hydrogens to the ligand in the previous step. Otherwise, the net charge is incorrect.
Parameterization of Protein and Complex
Prepare the Amber-readable Protein PDB File.
To get a ready-to-simulation protein pdb files, we need to modify a bit the PDB files from PDBBank. Here are some general tips:
- remove all crystal waters
- remove unnecessary cofactors (ligands, ions, etc)
- add missing residues
- correct residue names
As the file downloaded is already processed, we don't need to do it here.
Before, pass it into ambertools
, we need to convert it to amber-readable PDB. Basically this step will correct some residue names and add some missing atoms.
================================================== Summary of pdb4amber for: 3htb.pdb =================================================== ----------Chains The following (original) chains have been found: A ---------- Alternate Locations (Original Residues!)) The following residues had alternate locations: MET_1 THR_21 TYR_24 ASN_68 ASP_72 ARG_80 ARG_96 GLN_102 LYS_147 ARG_154 -----------Non-standard-resnames ---------- Missing heavy atom(s) None The alternate coordinates have been discarded. Only the first occurrence for each atom was kept.
Prepare topology by tLEaP
LEaP is a powerful modeling tool in Ambertool. xLEaP enables you run scripts in a graphical window. If only terminal line is available, use tLEaP instead.
Now let’s prepare the input script of tLEaP. Use the text below and name the file as 3htb.tleap.in
:
Here we get the topology file complex.prmtop
and the coordinate file complex.inpcrd
.
The topology file contains the connectivity information, charges, force field parameters, etc;
The coordinate file contains the Cartesian coordinates of each atom.
-I: Adding /opt/mamba/dat/leap/prep to search path. -I: Adding /opt/mamba/dat/leap/lib to search path. -I: Adding /opt/mamba/dat/leap/parm to search path. -I: Adding /opt/mamba/dat/leap/cmd to search path. -f: Source tleap.in. Welcome to LEaP! (no leaprc in search path) Sourcing: ./tleap.in ----- Source: /opt/mamba/dat/leap/cmd/leaprc.protein.ff14SB ----- Source of /opt/mamba/dat/leap/cmd/leaprc.protein.ff14SB done Log file: ./leap.log Loading parameters: /opt/mamba/dat/leap/parm/parm10.dat Reading title: PARM99 + frcmod.ff99SB + frcmod.parmbsc0 + OL3 for RNA Loading parameters: /opt/mamba/dat/leap/parm/frcmod.ff14SB Reading force field modification type file (frcmod) Reading title: ff14SB protein backbone and sidechain parameters Loading library: /opt/mamba/dat/leap/lib/amino12.lib Loading library: /opt/mamba/dat/leap/lib/aminoct12.lib Loading library: /opt/mamba/dat/leap/lib/aminont12.lib ----- Source: /opt/mamba/dat/leap/cmd/leaprc.gaff2 ----- Source of /opt/mamba/dat/leap/cmd/leaprc.gaff2 done Log file: ./leap.log Loading parameters: /opt/mamba/dat/leap/parm/gaff2.dat Reading title: AMBER General Force Field for organic molecules (Version 2.2.20, March 2021) ----- Source: /opt/mamba/dat/leap/cmd/leaprc.water.tip3p ----- Source of /opt/mamba/dat/leap/cmd/leaprc.water.tip3p done Loading library: /opt/mamba/dat/leap/lib/atomic_ions.lib Loading library: /opt/mamba/dat/leap/lib/solvents.lib Loading parameters: /opt/mamba/dat/leap/parm/frcmod.tip3p Reading force field modification type file (frcmod) Reading title: This is the additional/replacement parameter set for TIP3P water Loading parameters: /opt/mamba/dat/leap/parm/frcmod.ions1lm_126_tip3p Reading force field modification type file (frcmod) Reading title: Li/Merz ion parameters of monovalent ions for TIP3P water model (12-6 normal usage set) Loading parameters: /opt/mamba/dat/leap/parm/frcmod.ionsjc_tip3p Reading force field modification type file (frcmod) Reading title: Monovalent ion parameters for Ewald and TIP3P water from Joung & Cheatham JPCB (2008) Loading parameters: /opt/mamba/dat/leap/parm/frcmod.ions234lm_126_tip3p Reading force field modification type file (frcmod) Reading title: Li/Merz ion parameters of divalent to tetravalent ions for TIP3P water model (12-6 normal usage set) Loading parameters: ./jz4.gaff2.frcmod Reading force field modification type file (frcmod) Reading title: Remark line goes here Loading Mol2 file: ./jz4.gaff2.mol2 Reading MOLECULE named JZ4 Loading PDB file: ./3htb.amber.pdb Added missing heavy atom: .R<CASN 163>.A<OXT 15> total atoms in file: 1300 Leap added 1314 missing atoms according to residue templates: 1 Heavy 1313 H / lone pairs Solute vdw bounding box: 43.434 44.621 52.972 Total bounding box for atom centers: 67.434 68.621 76.972 Solvent unit box: 18.774 18.774 18.774 Total vdw box size: 70.504 71.808 79.796 angstroms. Volume: 403987.403 A^3 Total mass 198988.730 amu, Density 0.818 g/cc Added 10010 residues. /opt/mamba/bin/teLeap: Warning! addIonsRand: 1st Ion & target unit have charges of the same sign: unit charge = 6; ion1 charge = 1; can't neutralize. 6 Cl- ions required to neutralize. Adding 6 counter ions to "complex". 10004 solvent molecules will remain. 0: Placed Cl- in complex at (19.40, 8.70, 10.53). 0: Placed Cl- in complex at (10.00, -22.19, -26.97). 0: Placed Cl- in complex at (31.42, -18.51, -34.60). 0: Placed Cl- in complex at (-2.85, 0.66, -24.86). 0: Placed Cl- in complex at (-17.07, -15.26, 21.14). 0: Placed Cl- in complex at (0.91, 27.90, 35.47). Adding 10 counter ions to "complex". 9994 solvent molecules will remain. 0: Placed Na+ in complex at (-7.96, 8.61, 24.31). 0: Placed Na+ in complex at (-5.24, 17.69, 31.78). 0: Placed Na+ in complex at (-29.58, -32.79, -37.63). 0: Placed Na+ in complex at (24.50, -14.55, 11.93). 0: Placed Na+ in complex at (-9.80, 14.45, -10.58). 0: Placed Na+ in complex at (24.09, -17.04, -19.71). 0: Placed Na+ in complex at (1.11, 17.65, 34.45). 0: Placed Na+ in complex at (4.48, -23.14, -6.77). 0: Placed Na+ in complex at (-17.93, -27.81, 7.55). 0: Placed Na+ in complex at (-15.41, -4.75, -11.15). Adding 10 counter ions to "complex". 9984 solvent molecules will remain. 0: Placed Cl- in complex at (20.51, 32.03, 35.45). 0: Placed Cl- in complex at (-9.59, 18.42, -13.25). 0: Placed Cl- in complex at (-33.44, 20.78, 11.72). 0: Placed Cl- in complex at (-9.17, -14.18, 23.44). 0: Placed Cl- in complex at (9.51, -24.63, 37.59). 0: Placed Cl- in complex at (27.21, 20.13, -24.83). 0: Placed Cl- in complex at (-16.46, -12.76, -17.80). 0: Placed Cl- in complex at (-31.11, 16.55, 0.48). 0: Placed Cl- in complex at (12.65, -18.51, 21.72). 0: Placed Cl- in complex at (31.17, -32.78, -30.53). Checking Unit. Building topology. Building atom parameters. Building bond parameters. Building angle parameters. Building proper torsion parameters. Building improper torsion parameters. total 534 improper torsions applied Building H-Bond parameters. Incorporating Non-Bonded adjustments. Not Marking per-residue atom chain types. Marking per-residue atom chain types. (Residues lacking connect0/connect1 - these don't have chain types marked: res total affected CASN 1 JZ4 1 NMET 1 WAT 9984 ) (no restraints) Quit Exiting LEaP: Errors = 0; Warnings = 1; Notes = 0.
Let's Simulate!
You can refer to [http://docs.openmm.org/7.6.0/userguide/application/02_running_sims.html] OpenMM documents for more details. But here we will go beyond that.
For a standard MD simultion, we need the following steps:
- Energy Minimization
the structures we built above is way far from the stable equilbrium states. The water molecules are randomly placed and some atoms have bad contacts or overlaps with each other. If we start directly from this file, we can end up with numerical explosion. (Typically
nan
energy, velocities and coordinates) - NVT Equilibrium (heating)
We don't have the velocity information right now. One way is to assign velocities to each atom according to Boltzman distribution. However, this can give a super large energy state of the system which could fail simulations. A safer way is to increase the temperature gradually by coupling the system to a heat bath and monitor the temperature till equilibrium. This algorithm can be achieved by Langebin dynamics or other heat bath.
- NPT Equilibrium (pressurizing)
NPT is a more realistic ensemble to real situations. To this end, we need to couple the system to a barostat. Again, we need a couple of picoseconds or nanoseconds to allow the systems reach a equilibrium. The pressure of micro particle system is calculated by Virial tensor. However, pressure is actually a macro property of a huge bunch of particles that can vary drastically in micro systems. Therefore, instead of monitoring pressure, we will monitor the density as a proxy.
- Production Run under NPT
Finally, after all equilibrium steps, we will propagate our system for a fairly long time scale to sample the desired phase space extensively.
Now let's load the files we have prepared.
Then create a system based the topology. We will allow periodic boundary condition by specify nonbondedMethod=PME
.
PME
stands for Particle Mesh Ewald summantion methods. It uses fast fourior transormation to calculate slowly converged long-range interactions rapidly.
We need to choose an integrator to propagate the system. Instead of standard VV algorithm, we choose LagevinMiddleIntegrator
to couple the system to heat bath in the integrator level.
Energy minimization
NVT Equilibrium
Warming up the system... heating to 5 K #"Step","Potential Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)","Density (g/mL)" 5000,-528112.6655548871,24.698794077986207,403.9874031868558,0.8192689923107143 10000,-522999.79055488715,54.574706776363364,403.9874031868558,0.8192689923107143 15000,-517545.66555488715,83.88603284208452,403.9874031868558,0.8192689923107143 heating to 105 K 20000,-511602.29055488715,114.47304457150372,403.9874031868558,0.8192689923107143 25000,-503905.91555488715,144.8499855913975,403.9874031868558,0.8192689923107143 30000,-495757.04055488715,174.43934329962534,403.9874031868558,0.8192689923107143 heating to 205 K 35000,-483894.91555488715,202.3606573622087,403.9874031868558,0.8192689923107143 40000,-466474.41555488715,232.6962741209631,403.9874031868558,0.8192689923107143 45000,-447218.66555488715,263.0941494911721,403.9874031868558,0.8192689923107143 50000,-430301.41555488715,292.96518588481234,403.9874031868558,0.8192689923107143
Now you can see we heat the systems to 300K.
Under NVT ensemble, you can easily see that the box volumn remains constant.
NPT Equilibrium
Now you can see we compress the systems around a constant density.
Under NPT ensemble, the box, temperature and density fluctuate around a specific value during simulation. It's not as strict as the statistical therodynamics requirement due to the numerical issues.
Hui_Zhou
Bohrium小助手