- This notebook was adopted from an independent course final projects (MIT 10.437/10.637/5.697/5.698 Computational Chemistry 2023).
- In theory, only NVE simulations can guarantee a strict diffusion coefficient calculation, while this notebook seeks to know how NVT/NPT influence the calcultion of diffusion coefficients.
- You can modify the code below to accomplish NVE simulations by
OpenMM
- For motivation and theory part, one may refer to : 拿分子动力学算个扩散系数,怎么这么多讲究?
- 如果你想阅读中文,请点击右侧的翻译按钮翻译成中文。
1. Introduction
Diffusion coefficient is an import transport property. One can easily obtain the diffusion coefficient from the Green-Kuo equation as1:
where is the number of particles, is the velocity of the ’th particle. However, the autocorrelation function of particle’s velocity converges very slowly and would be affected largely by the finite size effect. A more practical expression is the Einstein relation:
and the mean square displacement (MSD) is defined as: In practice, the Einstein formula gains better numerical stability and becomes more popular. Nevertheless, calculating diffusion coefficient is not a trivial problem. Many parameters and setting in molecular dynamics simulation may influence the final results and lead to unpleasant artifacts. In this project, we will go through several aspects those can affect the diffusion coefficient calculations, including the choice of ensemble, finite size effects (of cubic boxes) and the thermostat effects. Careful tuning simulation parameters is critical to satisfying results and this project provides practical guidance about how to effectively setup molecular dynamics simulations for transport properties.
2. Theory
Finite size effect on diffusion coefficient calculation
MD programs can only simulate finite systems in practice due to the finite computational resources. To mimic the bulk property in the real world and eliminate the surface effect, the periodical boundary condition (PBC) is widely used, where the copies (or images) of the simulated box is periodically placed together forming an infinite system. However, due to the long-range nature of electrostatic interactions, the thermodynamic and transport properties may suffer from the PBC artifacts as the molecules can actually see their images in neighboring cells2. The artifacts become severe especially when the box size approaching to the nonbonded interaction cutoff when using Particle Mesh Ewald (PME)3 methods for electrostatic interactions.
Dünweg and Kremer4 firstly found the computed diffusion coefficient is proportional to N^(1/3). Yeh and Hummer5 finally derived an analytical relation between computed diffusion coefficients and the inverse box size, known as the Yeh-Hummer (YH) relation:
where is the computed self-diffusion coefficient at a given box size; is the corrected diffusion coefficient for an infinite box; is the Boltzmann constant; is the temperature; is a dimensionless constant equal to for cubic lattices (This value was measured under NVE ensemble); is the shear viscosity computed from the simulation; is the box length of cubic boxes. From the YH relation, we would expect the computed diffusion coefficient would generally become larger in a larger box and the extrapolated gives the greatest value. We also would expect that when the box length L is large enough, the computed diffusion coefficient may converge slowly and be close to .
However, YH relation was proposed at a time when only a few thousands of water molecules were affordable at large. As we have faster and better computational resources, this effect may be less significant. In the Result Section, we will examine the YH relation for a wide range of numbers of water molecules, and numerically show how change when the box is sufficiently large.
Effect of the Barostat
Different thermodynamic ensembles have different simulation techniques. A barostat is exerted on the system to maintain a constant pressure under NPT ensemble simulations. However, barostats usually control pressure by scaling the coordinates of particles globally. The scaling technique can drastically affect the calculation of MSD and thus affect diffusion coefficient results. Monte Carlo Barostat6 is widely used for NPT ensemble MD simulations, which uses MD/MC scheme to sample volume fluctuations and control system pressure. For a NPT ensemble, the thermodynamic probability is expressed as:
where is the kinetic energy; is the potential energy; is the external pressure and is the volume. After each regular MD step, a trial MC step is proposed to change the system volume by scaling the coordinates as:
is the box length, is the coordinates and denotes the coordinates of the center of the periodic box. The Metropolis weight is then calculated by:
where is the desired pressure, . The trial move will be conditionally accepted with probability if and always accepted otherwise.
When scaling the coordinates by , the MSD would have an error approximately to . Therefore, we would expect the results from NPT ensemble are significantly different from the NVT/NVE ensemble. When the box size becomes larger, becomes larger too given a fixed . So, we also would expect that finite size effect becomes significant in NPT simulations.
Effect of the thermostats
Despite most papers have reported their results under NVT ensembles, the parameters of thermostats they use can also significantly influence the diffusion coefficient results. In NVT ensemble, the temperature is calculated as:
Essentially, maintaining temperature means controlling the kinetic energy of particles. Thermostats are a bunch of algorithms helping NVT ensemble simulations be within reasonable temperature fluctuations by tuning the velocities of particles. They are effectively correlating the system to a heat bath. Two main categories exist in thermostats: the stochastic algorithm, e.g., the Andersen thermostat , the family of Langevin Thermostats8 , etc.; and the scaling-based algorithm, e.g., the Nosé-Hoover (Chain) Thermostat9,10 , the V-rescaling thermostat11 , etc.
The stochastic algorithms use stochastic collisions on particles to mimic the heat conductivity. In the Andersen thermostat scheme, during each regular MD step, a set of particles are selected whose velocities will be resampled from the velocity Boltzmann distribution. The selection probability is tuned by the parameter called “collision frequency”. When the velocities of particles are changed/resampled, the history of kinetic information is totally discarded. As expected, when the collision frequency is high, the computed diffusion coefficient is low. In a limiting situation, if all particles get collision at every step, no kinetic information is kept and the results will be similar to that from a random walk.
For the theory of thermostats in detail, you may refer to the following notebooks:
Effect of water constraints
Hydrodynamics is important for property calculation. However, covalent hydrogen bonds typically have very high vibrational frequency, suffering from high frequency quantum effect and numerical instability. To overcome this problem, constraint techniques were proposed. The bond length and bond angle are constrained at fixed values based on algorithms during simulations. Generally used constraint algorithms include SHAKE12,13 , SETTLE14 , RATTLE15 etc.
The flexibility of water molecules can largely influence the diffusion coefficient calculation. Some papers16 reported that diffusion coefficients were very sensitive to the bond length of water molecules. However, due to the high frequency nature of the covalent hydrogen bond, one has to use smaller time step (at least no greater than 1 fs) for numerical stability. This leads to significant longer simulation time in real situations.
Simulations
In this section, we will use OpenMM 8.0
to perform numerical experiments. GPU computational nodes are recommended.
A code template is provided below.
In this project, we choose TIP3P water model as the example model, parameterized by Li et al in 2015 17 . This water model is currently the standard water model in AmberTools18 tleap program.
The well-equilibrated water models are provided under /bohr/WaterTip3pBoxes-fmf8/v1/models
All simulations are performed via OpenMM 8.019 . Temperature was set to 300 K. The periodical boundary condition was applied for all simulations. Nonbonded interactions are calculated under the Particle Mesh Ewald (PME) scheme, where the nonbonded cutoff is set to 1 nanometer.
tip3p_10.0.prmtop* tip3p_17.prmtop* tip3p_10.0_equilibrated.inpcrd* tip3p_17_equilibrated.inpcrd* tip3p_11.0.prmtop* tip3p_18.prmtop* tip3p_11.0_equilibrated.inpcrd* tip3p_18_equilibrated.inpcrd* tip3p_12.0.prmtop* tip3p_19.prmtop* tip3p_12.0_equilibrated.inpcrd* tip3p_19_equilibrated.inpcrd* tip3p_13.0.prmtop* tip3p_20.prmtop* tip3p_13.0_equilibrated.inpcrd* tip3p_20_equilibrated.inpcrd* tip3p_14.0.prmtop* tip3p_25.prmtop* tip3p_14.0_equilibrated.inpcrd* tip3p_25_equilibrated.inpcrd* tip3p_15.0.prmtop* tip3p_30.prmtop* tip3p_15.0_equilibrated.inpcrd* tip3p_30_equilibrated.inpcrd* tip3p_16.prmtop* tip3p_35.prmtop* tip3p_16_equilibrated.inpcrd* tip3p_35_equilibrated.inpcrd*
Code template for simulations, please modify as needed
Code templates for different tests are provided below. They may cost a significant amount of time to be finished. We recommend you using computational clusters or submit jobs to Bohrium.
Code template for analysis, please modify as needed
After simulations, use Ambertools to get the diffusion coefficients from the trajectories.
A code template is provided below.
Code template for visualization
Results
Finite size effects are well captured by Yeh-Hummer relation
To obtain the results of different box sizes, we performed MD simulations of water boxes with different sizes. Since the nonbonded cutoff was set to 1 nanometer, the smallest built initial box length has to be greater than 2 nanometers, otherwise the periodic images will lead to great artifacts.
From the simulation results (shown in Figure 1), we observed that the computed diffusion coefficients become larger at larger box sizes in general. The results basically obey the YH relation with coefficients equal to 0.391.
Another observation is that with the increase of the box size, the variance of the computed diffusion coefficients becomes smaller. Simulating larger water boxes is beneficial to get statistically confidential results. It’s worth noting that in the previous work of Yeh and Hummer, they simulated water boxes containing 128, 256, 512, 1024 and 2048 respectively. The number of water molecules in our systems ranges from 371 to 12626, which is significantly broader than that in Yeh’ work. However, even with as many as 12626 water molecules, the calculated diffusion coefficient still has the general trend. The extrapolated , owning the greatest values as well as the largest error compared to the experimental results . However, this values practically eliminate the effect of finite size and therefore reflect the performance of force fields better.
Diffusion coefficients from NPT ensembles are different from NVT simulations
In general, NPT simulation (red) showed similar results to NVT simulations (blue) and obeyed Yeh-Hummer relation5 (shown in Figure 1) with R2 equals to 0.502. The extrapolated is , larger than . The fitted slope of NPT results (-0.606) is also larger than that in NVT results (-0.436), indicating NPT simulation was more sensitive to the cubic box size. This was expected because the barostats in NPT simulation scales the particle coordinates globally as we discussed in the Theory Section, although the difference between the two ensembles is not very significant under current simulation setting. Nevertheless, we still recommend not to use NPT ensemble simulations to compute the diffusion coefficient because of those artifacts.
Nosé-Hoover Thermostat significantly increases diffusion coefficients than stochastic schemes
Nosé-Hoover thermostats showed significantly larger diffusion coefficients than all stochastic thermostats (shown in Figure 2). The scaling-based algorithm keeps most history of the kinetic information and makes trajectory more continuous. So, the diffusion coefficient becomes larger given the same interaction interval () with the heat bath.
Among the stochastic thermostats, Andersen thermostats had the largest diffusion coefficient when the collision frequency was set to . A possible reason is that, in Andersen scheme, the particles not selected were still keeping NVE trajectories, while in Langevin scheme, all particles were experiencing dissipation and collision and therefore likely lost more kinetic information. From this angle, the NVE ensemble is a better candidate for transport property calculations in principle because no thermostat or barostat disturb the trajectories. From the statistical mechanic theory, when the number of particles is large enough, NVE ensemble, NVT ensemble and NPT ensemble would resemble with each other.
In summary, all stochastic thermostat caused the collapse of the kinetic information. NVE simulation may be desired when system is large enough.
Higher collision frequency causes more randomness in simulations
To better understand how the stochastic thermostats affect the calculation of transport properties, we further studied the results calculated from different collision frequencies (shown in Figure 3) using Andersen thermostats and Langevin thermostats. As expected, when the collision frequency becomes larger, the diffusion coefficients decrease due to the loss of kinetic information. When the collision frequency is very high (), the results from the two stochastic algorithms are approaching to each other as they resemble Brownian dynamics gradually.
Although low collision frequency largely keeps kinetic information, it can’t maintain the temperature well with the largest standard deviation on diffusion calculations (). We here recommend collision frequency for stochastic thermostats. This value has the good capability to maintain temperature as well as gives results close to those in low collision frequency (rel. error 6%).
It is worth noting that the decreased diffusion coefficients by increase collision rate doesn’t mean the reduced error to experimental value . It is just the artifact from the nature of randomness.
Flexible water models are more accurate
No significant difference was found among the constrained water models (shown in Figure 4). However, the flexible water model had a significantly lower diffusion coefficient value and much closer to the experimental results. The flexibility made diffusion of water molecules slower than rigid models. A possible reason is that the flexibility changed the collision behavior between molecules. Flexible water models also have larger both shear and bulk viscosity. The interactions between flexible water molecules are larger than those in rigid models. The water structure and hydrogen bond network also changed because of the flexibility. All these factors slowed down the motion of water and caused a lower diffusion coefficient.
In practical speaking, flexible water models need smaller time step (). The wall time for flexible water model is twice as the rigid water models. Therefore, the choice of constraints is the trade-off between accuracy and speed. When computational resources are enough, not using constraints is recommended.
However, one should note that generally flexible water models have different force field parameters from the rigid models. They are usually separately parameterized. In this article, we only released the constraints while still keeping the TIP3P parameters just to show the effect solely from constraints. In practice, TIP3P is widely used as a rigid water model. If a flexible water model is needed, one may refer to SPC/FW16, TIP3P/FW16, etc. to gain higher accuracy.
Summary
In this study, we examined several molecular dynamics simulation settings those can affect the calculations of diffusion coefficients. We provided the recommended simulation guidance as follows. One should calculate diffusion coefficients under NVE ensemble in principle with sufficiently large simulation box. When NVT ensemble is necessary, one should choose thermostat carefully with recommended collision frequency. Nosé-Hoover thermostat can get significantly larger computed diffusion coefficient. Avoiding NPT ensemble is necessary. When box size is limited, one can use YH relation extrapolation to get approximated diffusion coefficient values of infinite box size. Water constraints can fasten water diffusion and result in much larger diffusion coefficients. Flexible waters are highly recommended if accuracy matters in the questions of interest.
Reference
(1) Wang, J.; Hou, T. Application of Molecular Dynamics Simulations in Molecular Property Prediction II: Diffusion Coefficient. J. Comput. Chem. 2011, 32 (16), 3505–3519. https://doi.org/10.1002/jcc.21939.
(2) Beglov, D.; Roux, B. Finite Representation of an Infinite Bulk System: Solvent Boundary Potential for Computer Simulations. J. Chem. Phys. 1994, 100 (12), 9050–9063. https://doi.org/10.1063/1.466711.
(3) Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An N ⋅log( N ) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98 (12), 10089–10092. https://doi.org/10.1063/1.464397.
(4) Dünweg, B.; Kremer, K. Microscopic Verification of Dynamic Scaling in Dilute Polymer Solutions: A Molecular-Dynamics Simulation. Phys. Rev. Lett. 1991, 66 (23), 2996–2999. https://doi.org/10.1103/PhysRevLett.66.2996.
(5) Yeh, I.-C.; Hummer, G. System-Size Dependence of Diffusion Coefficients and Viscosities from Molecular Dynamics Simulations with Periodic Boundary Conditions. J. Phys. Chem. B 2004, 108 (40), 15873–15879. https://doi.org/10.1021/jp0477147.
(6) Åqvist, J.; Wennerström, P.; Nervall, M.; Bjelic, S.; Brandsdal, B. O. Molecular Dynamics Simulations of Water and Biomolecules with a Monte Carlo Constant Pressure Algorithm. Chem. Phys. Lett. 2004, 384 (4–6), 288–294. https://doi.org/10.1016/j.cplett.2003.12.039.
(7) Andersen, H. C. Molecular Dynamics Simulations at Constant Pressure and/or Temperature. J. Chem. Phys. 1980, 72 (4), 2384–2393. https://doi.org/10.1063/1.439486.
(8) Zhang, Z.; Liu, X.; Yan, K.; Tuckerman, M. E.; Liu, J. Unified Efficient Thermostat Scheme for the Canonical Ensemble with Holonomic or Isokinetic Constraints via Molecular Dynamics. J. Phys. Chem. A 2019, 123 (28), 6056–6079. https://doi.org/10.1021/acs.jpca.9b02771.
(9) Martyna, G. J.; Klein, M. L.; Tuckerman, M. Nosé–Hoover Chains: The Canonical Ensemble via Continuous Dynamics. J. Chem. Phys. 1992, 97 (4), 2635–2643. https://doi.org/10.1063/1.463940.
(10) Posch, H. A.; Hoover, W. G.; Vesely, F. J. Canonical Dynamics of the Nosé Oscillator: Stability, Order, and Chaos. Phys. Rev. A 1986, 33 (6), 4253–4265. https://doi.org/10.1103/PhysRevA.33.4253.
(11) Bussi, G.; Donadio, D.; Parrinello, M. Canonical Sampling through Velocity Rescaling. J. Chem. Phys. 2007, 126 (1), 014101. https://doi.org/10.1063/1.2408420.
(12) Elber, R.; Ruymgaart, A. P.; Hess, B. SHAKE Parallelization. Eur. Phys. J. Spec. Top. 2011, 200 (1), 211–223. https://doi.org/10.1140/epjst/e2011-01525-9.
(13) Forester, T. R.; Smith, W. SHAKE, Rattle, and Roll: Efficient Constraint Algorithms for Linked Rigid Bodies. J. Comput. Chem. 1998, 19 (1), 102–111. https://doi.org/10.1002/(SICI)1096-987X(19980115)19:1<102::AID-JCC9>3.0.CO;2-T.
(14) Miyamoto, S.; Kollman, P. A. Settle: An Analytical Version of the SHAKE and RATTLE Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13 (8), 952–962. https://doi.org/10.1002/jcc.540130805.
(15) Andersen, H. C. Rattle: A “Velocity” Version of the Shake Algorithm for Molecular Dynamics Calculations. J. Comput. Phys. 1983, 52 (1), 24–34. https://doi.org/10.1016/0021-9991(83)90014-1.
(16) Wu, Y.; Tepper, H. L.; Voth, G. A. Flexible Simple Point-Charge Water Model with Improved Liquid-State Properties. J. Chem. Phys. 2006, 124 (2), 024503. https://doi.org/10.1063/1.2136877.
(17) Li, P.; Song, L. F.; Merz, K. M. Systematic Parameterization of Monovalent Ions Employing the Nonbonded Model. J. Chem. Theory Comput. 2015, 11 (4), 1645–1657. https://doi.org/10.1021/ct500918t.
(18) Case, D. A.; Aktulga, H. M.; Belfon, K.; Cerutti, D. S.; Cisneros, G. A.; Cruzeiro, V. W. D.; Forouzesh, N.; Giese, T. J.; Götz, A. W.; Gohlke, H.; Izadi, S.; Kasavajhala, K.; Kaymak, M. C.; King, E.; Kurtzman, T.; Lee, T.-S.; Li, P.; Liu, J.; Luchko, T.; Luo, R.; Manathunga, M.; Machado, M. R.; Nguyen, H. M.; O’Hearn, K. A.; Onufriev, A. V.; Pan, F.; Pantano, S.; Qi, R.; Rahnamoun, A.; Risheh, A.; Schott-Verdugo, S.; Shajan, A.; Swails, J.; Wang, J.; Wei, H.; Wu, X.; Wu, Y.; Zhang, S.; Zhao, S.; Zhu, Q.; Cheatham, T. E.; Roe, D. R.; Roitberg, A.; Simmerling, C.; York, D. M.; Nagan, M. C.; Merz, K. M. AmberTools. J. Chem. Inf. Model. 2023, 63 (20), 6183–6191. https://doi.org/10.1021/acs.jcim.3c01153.
(19) Eastman, P.; Swails, J.; Chodera, J. D.; McGibbon, R. T.; Zhao, Y.; Beauchamp, K. A.; Wang, L.-P.; Simmonett, A. C.; Harrigan, M. P.; Stern, C. D.; Wiewiora, R. P.; Brooks, B. R.; Pande, V. S. OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLOS Comput. Biol. 2017, 13 (7), e1005659. https://doi.org/10.1371/journal.pcbi.1005659.
(20) Roe, D. R.; Cheatham, T. E. Parallelization of CPPTRAJ Enables Large Scale Analysis of Molecular Dynamics Trajectory Data. J. Comput. Chem. 2018, 39 (25), 2110–2117. https://doi.org/10.1002/jcc.25382.
(21) Roe, D. R.; Cheatham, T. E. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J. Chem. Theory Comput. 2013, 9 (7), 3084–3095. https://doi.org/10.1021/ct400341p.
(22) Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26 (16), 1668–1688. https://doi.org/10.1002/jcc.20290.
(23) Reuther, A.; Kepner, J.; Byun, C.; Samsi, S.; Arcand, W.; Bestor, D.; Bergeron, B.; Gadepally, V.; Houle, M.; Hubbell, M.; Jones, M.; Klein, A.; Milechin, L.; Mullen, J.; Prout, A.; Rosa, A.; Yee, C.; Michaleas, P. Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis. In 2018 IEEE High Performance extreme Computing Conference (HPEC); IEEE: Waltham, MA, 2018; pp 1–6. https://doi.org/10.1109/HPEC.2018.8547629.