Getting started with DMFF: A comprehensive beginner's tutorial.
In the simulation of molecular systems, the underlying force field (FF) model plays an extremely important role, determining the reliability of the simulation. However, the quality of the state-of-the-art molecular force fields is still unsatisfactory in many cases, and the FF parameterization process largely relies on human experience, which is not scalable. To address this issue, we introduce DMFF (Differentiable Molecular Force Field), an open-source molecular FF development platform based on automatic differentiation technique. Using DMFF, both energies/forces and thermodynamic quantities such as ensemble averages and free energies can be evaluated in a differentiable way, realizing an automatic, yet highly flexible force field optimization workflow
For more details, please refer to this link: https://mp.weixin.qq.com/s/eVXTr1eU1-dGbC5UFybr0g
Note: JAX, DMFF, and other related tools are still undergoing rapid development and iteration. This notebook has been successfully executed using DMFF 0.2.0.
Choose dmff:0.2.0-notebook and you can directly run this notebook on Bohrium
This notebook is primarily organized by Wei Feng.
Before You Run
Molecular systems are a highly important category of systems, encompassing various organic molecules such as biomolecules, drug molecules, various coatings used in industries, porous materials like COFs, polymers and small molecule electrolytes in batteries, and more. Problems of interest in industries, such as structure prediction of biomolecules, drug screening, screening and design of materials, heavily rely on molecular dynamics (MD) simulations of molecular systems. Although the applications of MD have extended beyond molecular systems, simulating molecular systems remains one of the core applications of MD.
MD simulations usually rely on a potential energy surface (PES), which describes the energies and the forces of the simulated atoms. For the sake of computational efficiency, the PES is typically approximated using a classical model, namely molecular force field, instead of being computed ab initioly on-the-fly. Therefore, the quality of the underlying force field limits the accuracy and the predictive power of the simulation.
So it's obvious that the core of simulation is fitting microscale interactions, or potential functions, to accurately predict macroscopic properties. In classical molecular dynamics, the potential function is expressed mathematically in terms of a molecular force field.
What Is Molecular Force Field
As we mentioned above, one of the most important aspects in simulating molecular systems is characterizing the system's potential energy function and describing the interactions within the system. In Classical Molecular Dynamics (CMD) simulations, the potential energy function follows a fixed mathematical form, and the force field provides the mathematical functions for intermolecular and intramolecular interaction potentials. The traditional force fields commonly used in the industry, such as OPLS and GAFF, generally share a similar form:
The total energy can be divided into bonded and nonbonded components. The bonded component naturally depends on the intramolecular coordinates (bond lengths, bond angles, dihedral angles), while the nonbonded component (further decomposable into van der Waals interactions and electrostatic interactions) naturally depends on the interatomic distances. This form is almost universally used as the standard form in the entire industry. However, the molecular force fields currently employed in industrial applications face several challenges:
- Lack of portability and predictive capability: When studying new systems, it is often uncertain which force field will yield better results until actual simulations are performed. Apart from relying on "experience," it can be challenging to determine the appropriate force field selection criteria.
- Parameter's non-uniqueness and inconsistency: It is common to encounter multiple sets of completely different parameters that yield similar macroscopic predictions. As a result, it is not possible to solely rely on macroscopic data to determine which set of parameters is more reasonable at the microscale. Moreover, similar molecular systems often have entirely different force field parameters, making it challenging to combine force fields developed by different research groups.
Over the past few decades, the development of empirical force fields and the improvement of computational accuracy have been both an age-old and emerging topic. The quest to accurately and efficiently describe atomic interactions and establish a technical roadmap for developing a molecular force field applicable to the majority of systems has been the focus of researchers in the field of molecular force fields and molecular dynamics.
The optimization of parameters in force fields has long relied on manual intervention and "empirical" parameter tuning methods. This reliance raises concerns about the reliability and efficiency of empirical force field fitting. However, with the advent of the artificial intelligence era, one underlying technology, automatic differentiation, offers a new solution. This technology has paved the way for the development of Differentiable Molecular Force Field (DMFF), which holds the promise of becoming a powerful tool for molecular force field developers.
Automatic DIfferentiation & DMFF
Automatic differentiation is a high-precision and versatile method for computing derivatives in computer programs. It follows the mathematical chain rule to compute derivatives of composite functions. By tracing the derivative chains of every data point based on the computation graph, it can calculate the differentiation of the output with respect to the input variables.
Automatic differentiation plays a crucial role in optimizing neural network models. During model training, it is necessary to compute the gradients of the output with respect to the input variables through backpropagation and utilize gradient descent to optimize the model parameters. Automatic differentiation frameworks excel at achieving this task. With their efficient optimization capabilities for high-dimensional parameters, automatic differentiation techniques can be applied not only to neural network models but also to any framework that follows the "input model parameters → apply model computation → obtain computed results" structure. Molecular dynamics simulations precisely follow this workflow.
Therefore, leveraging automatic differentiation techniques and utilizing experimental or first-principles computed data as references, it is possible to optimize force field parameters by calculating the differentiation of the output results with respect to the input parameters through backpropagation. This enables the optimization of force field parameters in molecular dynamics simulations, just as in the optimization of neural network models.
Several attempts have been made to harness automatic differentiation techniques for molecular dynamics simulations, such as TorchMD, JAX-MD, and SPONGE. However, the deep computational graph involved in molecular dynamics simulations often introduces additional challenges. The backpropagation process, which calculates the differentiation from the output results to the input variables, can be computationally expensive. Additionally, there is still a lack of simulation engines specifically designed for rapid implementation and parameter optimization of force fields. Molecular force field developers urgently require comprehensive support for a wider range of force field functional forms and various types of objective functions. Now DMFF comes.
DMFF provides a comprehensive and rapid implementation of force field models and offers differentiable estimators of system energy, forces, and thermodynamic quantities. These differentiable estimators allow for the definition of corresponding object functions, enabling an automatic optimization process.
As mentioned above, the deep computational graph spanning the entire trajectory in molecular dynamics simulations would be computationally expensive and time-consuming. However, this limitation can be mitigated by employing a reweighting scheme for the trajectory. In DMFF, the reweighting algorithm is incorporated into the MBAR method, and extends the differentiable estimators for average properties and free energy calculations. We will demonstrate this method with practical examples to showcase its effectiveness.
DMFF gives a solution to two major problems in molecular force field development:
- How to ensure the rapid implementation and iteration of complex force fields in molecular dynamics simulations?
- How to improve the efficiency of optimizing parameters in high-dimensional functions and automate this process, while increasing the transferability of the parameters?
In our latest release, DMFF 0.2.0, we have explored a pathway for automatic optimization of force field parameters. The corresponding workflow has been validated in both simple small molecular systems and more complex electrolyte systems. We will guide you through a step-by-step notebook experience, demonstrating the complete workflow of DMFF for automatic optimization of force field parameters. Furthermore, using the notebook as a teaching template, we will provide a real hands-on experience showcasing the impact of DMFF on force field development.
Table of Contents:
- 1. Quick Start Guide to DMFF
- 2. Generating and extending potential functions
- 3. Bottom-Up Fitting
- 4.Using DMFF for force field optimization of liquid dimethyl carbonate (DMC)
- 4.1 Introduction
- 4.2 Definition of potential function
- 4.3 Definition of OpenMM sampler
- 4.4 Initial MD sampling
- 4.5 Definition of property calculation functions
- 4.6 Read the data and perform the comparison
- 4.7 Estimator initialization
- 4.8 Definition of target ensemble
- 4.9 Definition of loss function
- 4.10 Optimizer setup and optimization loop
- 5. Summary & Outlook
First, we set up the runtime environment and import the potential functions needed.
Already on 'devel' Your branch is up to date with 'origin/devel'.
In addition to DMFF, we also need to use JAX and OpenMM.
- OpenMM: Manages the force field files and parameter data
- JAX:framework of differentiation
At the same time, we will also use other libraries and trajectory analysis software such as mdtraj in the subsequent examples. Let's import them as well.
WARNING:pymbar.timeseries:Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate. The estimate presumes the use of many statistically independent samples. Tests should be performed to assess whether this condition is satisfied. Be cautious in the interpretation of the data.
Navigate to the working directory
1.1 Load existing force field parameters and topology | OpenMM Frontend
If you are using a force field in OpenMM, you can load the force field parameters using the ForceField
class, like:
app.Topology.loadBondDefinitions("lig-top.xml")
pdb = app.PDBFile('lig.pdb')
pdb_system = pdb.topology
forcefield = ForceField('someforcefield.xml')
In DMFF, there is a functionality class called Hamiltonian
that is similar to OpenMM's ForceField
class. It allows you to read force field parameters and define a more generalized system potential energy function. At the same time, it is compatible with reading existing force field parameters.
- We can use OpenMM to read PDB and topology files:
- topology(
log-top.xml
) - PDB(
lig.pdb
)
- topology(
- Load the force field parameters using DMFF's Hamiltonian class and create a differentiable potential energy function:
- GAFF force field file (
gaff-2.11.xml
) - Additional parameters for the corresponding molecule (assigned charges) (
lig-prm.xml
)
- GAFF force field file (
The DMFF potential function, apart from the name "Hamiltonian," is used in a similar way to OpenMM. The XML files of OpenMM force fields can also be directly reused.
In DMFF, the parameters and calculations of the potential function are managed by JAX. For example, the DMFF potential function includes the re-implemented HarmonicBondForce, HarmonicAngleForce, PeriodicTorsionForce, and NonbondedForce. The parameters in DMFF are JAX-wrapped Array objects. For example, if we define:
jnp.array([1.])
You will obtain a DeviceArray
type. This type has a similar interface to numpy.ndarray(), as both are high-performance arrays/matrices. However, unlike numpy, JAX's jax.numpy.array() can be extended to work with GPUs/TPUs, making it suitable for efficient automatic differentiation.
HarmonicBondForce <function HarmonicBondJaxGenerator.createForce.<locals>.potential_fn at 0x7f330c3d77f0> HarmonicAngleForce <function HarmonicAngleJaxGenerator.createForce.<locals>.potential_fn at 0x7f330c3d7c70> PeriodicTorsionForce <function PeriodicTorsionJaxGenerator.createForce.<locals>.potential_fn at 0x7f330c3d7d00> NonbondedForce <function NonbondedJaxGenerator.createForce.<locals>.potential_fn at 0x7f32ec5817e0> sigfix <class 'jaxlib.xla_extension.DeviceArray'> [] epsfix <class 'jaxlib.xla_extension.DeviceArray'> [] sigma <class 'jaxlib.xla_extension.DeviceArray'> shape: (97,) epsilon <class 'jaxlib.xla_extension.DeviceArray'> shape: (97,) charge <class 'jaxlib.xla_extension.DeviceArray'> shape: (66,) coulomb14scale <class 'jaxlib.xla_extension.DeviceArray'> [0.8333333] lj14scale <class 'jaxlib.xla_extension.DeviceArray'> [0.5] bcc <class 'jaxlib.xla_extension.DeviceArray'> [] vsite_types <class 'jaxlib.xla_extension.DeviceArray'> [] vsite_distances <class 'jaxlib.xla_extension.DeviceArray'> []
In the calculation of the defined potential functions, we require the following parameters:
coordinates: just use the coordinates in PDB file
box: We need to add the definition of the box since our PDB file does not contain this information. However, it is worth mentioning that if we use None for the box size, we can still obtain results because our system does not involve periodic boundaries.
pairs: The GAFF2 potential function requires the input of NeighborList to calculate nonbonded forces. Therefore, we can use the NeighborList class to obtain the pairs for the energy calculation.
Then, we can pass these to the get_energy
function, which is generated by the generator
from the parsed XML force field parameters in potentials.dmff_potentials. For example, we can calculate the energy associated with the NonbondedForce
interactions as follows:
-425.40488
For the system we defined above, the total energy can be calculated as follows:
To calculate the total energy using the previously defined potential, you can use the getPotentialFunc()
method, which will return the function for calculating the total energy.
-52.358948
So far, the methods we have used to calculate the system energy are similar to those in OpenMM.
The main advantage of using JAX as a computational backend is that we can use the jax.grad
function to obtain the gradients of a function. Its syntax is jax.grad(func, argnums)
, where it calculates the (partial) derivatives of the function with respect to the argument specified by argnums.
The interface of the total energy calculation function efunc that we obtained is [coordinates, box, bond pairs, force field parameters].
By applying the "differentiation of the function" operation, specifically taking the partial derivatives of the total energy with respect to the coordinates, we can compute the partial derivatives of the total energy with respect to the coordinates. These derivatives can then be used to calculate the forces acting on the atoms in the molecule.
[[ 803.5053 3400.27 -661.98413 ] [ 1150.3483 -2973.8022 4723.1094 ] [-1806.3174 -1691.2512 -4400.671 ] [-1975.181 -3796.7463 -2135.0186 ] [ 300.95966 -2505.3462 1973.2803 ] [ 1417.9027 4929.0312 690.1204 ] [-1214.6167 -752.06226 -728.6687 ] [ 506.18085 -1762.1459 -183.49246 ] [ 666.4907 -96.466 1381.9845 ] [-1764.406 -1823.6426 -1339.2583 ] [ 716.9508 2224.645 -611.07526 ] [ -668.10223 -2633.5884 -45.521576 ] [ 525.2708 1892.7621 -324.64764 ] [ -222.92917 -1252.5643 1038.3877 ] [ 1050.3809 1406.0747 859.39716 ] [ -150.03757 -816.84314 -112.414795 ] [-2460.543 1360.3517 -796.9952 ] [-1690.8473 -4545.091 67.75354 ] [ 2421.7532 2531.319 411.40106 ] [-1881.5691 2700.4482 -938.7882 ] [ 2230.6426 -2105.4688 996.7544 ] [ 2820.1313 3725.3066 531.66956 ] [ 1008.5734 -1133.9412 -123.15384 ] [ -276.46887 160.24274 -402.34088 ] [ 119.170166 482.1291 -187.77916 ] [ 350.39615 -1206.4073 97.62477 ] [ -646.07837 625.33057 662.3565 ] [ -848.41907 793.5155 -1179.742 ] [ -110.67904 -297.4613 1608.6833 ] [ 907.37463 -712.69055 -804.76746 ] [ -23.865623 -403.59476 221.90463 ] [ -974.9644 1265.961 34.98752 ] [ 1254.7183 408.12585 160.66191 ] [ -523.4277 797.085 -1009.95105 ] [ 384.23688 -426.2561 378.51938 ] [ -935.60455 511.5661 -1090.4733 ] [ -149.94562 -666.4287 1039.4056 ] [ 397.46478 -338.41284 339.30286 ] [ 414.70288 479.50735 767.1015 ] [ 965.32336 160.06296 13.4783325] [ -527.07196 650.70605 441.80896 ] [ 317.2796 998.7295 -60.578827 ] [ -273.489 918.78033 -523.65106 ] [ 1251.1185 930.4213 445.95312 ] [ -647.67566 -1616.5327 -86.54979 ] [-1666.081 -537.21234 -452.2054 ] [ -638.9363 -822.2604 -414.4256 ] [ 751.17883 454.1977 783.03796 ] [ 428.46597 -256.60504 -690.38293 ] [ -603.1392 231.0011 -406.5716 ] [ -831.8475 -148.27419 -253.46361 ] [ -156.61652 425.78406 815.5218 ] [ -181.66884 242.79425 13.229111 ] [ -527.74664 -599.4667 -484.45233 ] [ -955.4458 -160.5049 388.17126 ] [ -212.86476 -710.949 291.9231 ] [ 34.802746 149.08188 746.7626 ] [ -35.14848 -257.8299 -725.77954 ] [ -63.89473 318.1174 679.2535 ] [ 737.6649 43.31356 -182.68419 ] [ 723.8848 492.97958 -177.76422 ] [ 326.529 -76.563324 -903.23193 ] [ 8.246334 342.50415 700.3305 ] [ 721.10767 20.561882 -288.5298 ] [ -195.67831 -755.8612 93.2759 ] [ 128.55147 1809.5626 -670.13806 ]]
Similarly, we can also take derivatives with respect to the force field parameters. For example, if we want to optimize the parameters related to the nonbonded interactions, we can adjust the argnum parameter accordingly. By doing so, we can obtain the derivative of the nonbonded energy with respect to the charge parameter.
[ 652.77527 55.10861 729.3611 -171.49307 502.70825 -44.917175 129.63982 -142.31807 -149.62085 453.21515 46.37271 140.15305 575.4879 461.46902 294.4356 335.25162 27.828522 671.3639 390.8903 519.6834 220.51123 238.76968 229.97318 210.58838 237.0856 196.40985 231.87343 35.663544 457.7645 77.47992 256.54395 402.21216 592.4626 421.86688 -52.09668 440.84637 611.9574 237.98907 110.28607 150.65381 218.61093 240.2049 -211.85382 150.73334 310.89423 208.65222 -139.23056 -168.88818 114.364685 3.7261353 399.62814 298.2848 422.06445 485.14267 512.12665 549.8402 556.47235 394.4085 575.85754 606.74725 526.18463 521.27563 558.5588 560.4667 562.8122 333.7419 ]
1.3 [Review] Basic interface and key usage points of DMFF
1.3.1 Applications of DMFF:
Molecular Force Field Optimization Platform
- DMFF is compatible with OpenMM's XML format for molecular force field parameters
- Automatic differentiation and GPU support enable fast implementation of complex molecular dynamics force fields in DMFF
- flexible force field parameter optimization capability of DMFF:
- Improving a specific component among them×
- Complex parameter optimization√ Rapid implementation through automatic differentiation frameworks
1.3.2 General Operations
Generating Potential Functions
- Defining force parameters in an XML file (OpenMM interface)
- Description of the system in PDB and topology in XML format
pdb = app.PDBFile("lig.pdb")
ff = Hamiltonian("gaff-2.11.xml", "lig-prm.xml")
potentials = ff.createPotential(pdb.topology)
Calculating System's Energy and Forces
efunc = potentials.getPotentialFunc()
params = ff.getParameters()
# energy function
totene = efunc(positions, box, pairs, params)
print(totene)
# 力
pos_grad_func = jax.grad(efunc, argnums=0)
force = -pos_grad_func(positions, box, pairs, params)
print(force)
1.3.3 [More] Why do we need "differentiable" JAX: The first step towards infinite possibilities for new force fields.
Why JAX?
Here we take the example of the implementation of the HarmonicBondForce in DMFF to build a simple model to illustrate what we can do with the JAX backend of DMFF.
In this part we will use some customized files (modifying these files might help you understand what the DMFF frontend does):
we an create a new file dummy-prm.xml
<ForceField>
<Residues>
<Residue name="DUM">
<Atom name="C1" type="a" />
<Atom name="C2" type="a" />
<Bond atomName1="C1" atomName2="C2"/>
</Residue>
</Residues>
</ForceField>
dummy-top.xml
:
<Residues>
<Residue name='DUM'>
<Bond from='C1' to='C2'/>
</Residue>
</Residues>
dummy.pdb
:
REMARK 1 CREATED WITH MANUAL
HETATM 1 C1 DUM A 1 0.000 0.000 0.000 1.00 0.00 C
HETATM 2 C2 DUM A 1 1.000 0.000 0.000 1.00 0.00 C
TER 3 DUM A 1
END
<ForceField>
<AtomTypes>
<Type element="C" name="a" class="a" mass="12.01"/>
</AtomTypes>
<HarmonicBondForce>
<Bond type1="a" type2="a" length="0.10" k="100"/>
</HarmonicBondForce>
</ForceField>
dmff energy: 0.004999998 dmff force: [[ 0.99999976 -0. -0. ] [-0.99999976 -0. -0. ]] func energy: 0.004999998 func force: [[ 0.99999976 -0. -0. ] [-0.99999976 -0. -0. ]] param grad: [DeviceArray(-399.99988, dtype=float32, weak_type=True)] param grad, optimized: [DeviceArray(-0., dtype=float32, weak_type=True)]
- By taking derivatives with respect to coordinates, we can perform molecular dynamics (MD) simulations.
- By taking derivatives with respect to the box, we can perform NPT ensemble molecular dynamics (MD) simulations.
- By taking derivatives with respect to the parameters, we will be able to optimize the parameters of our defined force field using techniques such as gradient descent.
2. Generating and extending potential functions
This case is primarily contributed by Professor Kuang Yu
In the following example, using water as an illustration, we will demonstrate: 1. How to define a potential function using the DMFF frontend; 2. How to extend the backend function of a multipolar polarizable force field to introduce charge fluctuations with respect to the structure.
Navigate to the working directory
Let's briefly introduce the concept of a multipolar polarizable force field. In such force fields (e.g., AMOEBA, MPID), each atom is associated with a local coordinate system defined by its neighboring atoms. Within this local coordinate system, the atom can have a permanent multipole moment (). In addition to the permanent multipole moment, each atom also carries an induced dipole moment (), which responds to the electric field from the surrounding water molecules. The energy of the system, given a certain induced dipole moment, can be expressed as:
To compute the actual potential energy surface, it is necessary to adjust the induced dipole moments to minimize the energy:
In the DMFF ADMP module, we provide full support for such potential energy models. We start by writing the corresponding forcefield.xml
file:
<ForceField>
<AtomTypes>
<Type name="380" class="OW" element="O" mass="15.999"/>
<Type name="381" class="HW" element="H" mass="1.008"/>
</AtomTypes>
<Residues>
<Residue name="HOH">
<Atom name="H1" type="381"/>
<Atom name="H2" type="381"/>
<Atom name="O" type="380"/>
<Bond from="0" to="2"/>
<Bond from="1" to="2"/>
</Residue>
</Residues>
<ADMPPmeForce lmax="2"
mScale12="0.00" mScale13="0.00" mScale14="0.00" mScale15="1.00" mScale16="1.00"
pScale12="0.00" pScale13="0.00" pScale14="0.00" pScale15="1.00" pScale16="1.00"
dScale12="1.00" dScale13="1.00" dScale14="1.00" dScale15="1.00" dScale16="1.00">
<Atom type="380" kz="381" kx="-381"
c0="-0.803721"
dX="0.0" dY="0.0" dZ="-0.00784325"
qXX="0.000366476" qXY="0.0" qYY="-0.000381799" qXZ="0.0" qYZ="0.0" qZZ="1.53231e-05"
/>
<Atom type="381" kz="380" kx="381"
c0="0.401876"
dX="-0.00121713" dY="0.0" dZ="-0.00095895"
qXX="6.7161e-06" qXY="0.0" qYY="-3.37874e-05" qXZ="1.25905e-05" qYZ="0.0" qZZ="2.70713e-05"
/>
<Polarize type="380" polarizabilityXX="1.1249e-03" polarizabilityYY="1.1249e-03" polarizabilityZZ="1.1249e-03" thole="0.33"/>
<Polarize type="381" polarizabilityXX="2.6906e-04" polarizabilityYY="2.6906e-04" polarizabilityZZ="2.6906e-04" thole="0.33"/>
</ADMPPmeForce>
</ForceField>
As we can see, we have defined two atom types, O(380) and H(381), and defined the connectivity topology template for water molecules. In the <ADMPPmeForce>
section, we have defined the charges, dipoles, quadrupoles, and polarizabilities for the two atom types. Next, we create a Hamiltonian and use the molecular topology to build the potential energy surface:
efunc
represents the newly generated potential function. It's important to note that the createPotential
function returns a Potential
object. From this object, you can obtain the total potential energy function using pots.getPotentialFunc()
, and you can also access each individual force field term defined in the XML using pots.dmff_potentials
.
To use the newly obtained potential function, you can follow this approach:
-44.247986 [[ 26.789263 -28.855604 -23.824251 ] [-89.97108 71.58078 45.55526 ] [ -6.398665 12.705884 -2.0030034] [ 99.174255 -49.363472 7.53055 ] [-13.098161 9.614633 -20.819822 ] [-16.494581 -15.681494 -6.440734 ]]
Due to the JIT compilation, the first call may be slower, but the subsequent calls will be significantly faster.
-44.247986
Up to this point, we have completed the definition of the potential function in the frontend. The definition and usage of other potential functions are similar. Next, let's take a look at how we can extend this potential function.
In this case, we aim to introduce charge fluctuations into the existing multipolar polarizable force field. In the previous model, the permanent multipoles of all atoms were considered constant. However, we observe that the charges of water molecules vary with the changes in bond lengths and bond angles due to molecular vibrations. In other words, atomic parameters, including charges and dispersion coefficients, should ideally be functions of the molecular geometry. Although the induced dipoles from other molecules are already considered in the potential function, the charge fluctuations resulting from the structural changes of the molecule itself are not taken into account. To incorporate this into our model, we can define a new potential function based on the existing one.
First and foremost, we need to acknowledge that the previously defined frontend function is not suitable for such extensions. This is because the efunc
function's input is based on force field parameters rather than atomic parameters. We can examine the charge values in the params
variable within the efunc
function's input:
[-0.803721 0.401876]
It is evident that efunc
and gfunc
only accept charges for the O and H atoms, and they do not allow specifying individual charges for each atom (i.e., by default assigning the same charge to all O atoms). This is a characteristic of the DMFF frontend functions: they handle the typification process internally and do not expose per-atom parameters as inputs. However, in our example, we want to specify charges for each atom individually. Therefore, we need to call the backend functions, which are typically stored in the generator
object:
Help on CompiledFunction in module dmff.admp.pme: get_energy(positions, box, pairs, Q_local, pol, tholes, mScales, pScales, dScales, U_init=DeviceArray([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32))
It can be seen that the backend function's API design is more complex. This function is not responsible for typification and accepts the local multipole moments (Q_local
), polarizability (pol
), and the thole damping width (thole
) for each atom as inputs. Therefore, it offers greater flexibility for extension. Additionally, the generator
object also contains other information that we will need later, such as the atom_type
that records the atom types for each atom (we will use this variable for expanding the force field parameters):
[0 1 1 0 1 1]
We would like to implement the following computation process:
That is, introduce structural dependence into Q based on the positions:
To do this, let's define a function that calculates the charges based on the molecular structure:():
Then, let's define a function generator that combines the compute_charges
function and the pme_force.get_energy
function:
Next, we can calculate the corresponding energy and forces:
[[ 26.789217 -28.855394 -23.824253 ] [-89.971054 71.58072 45.555294 ] [ -6.3986936 12.705858 -2.0030873] [ 99.17415 -49.363495 7.5305967] [-13.098191 9.614621 -20.819775 ] [-16.494642 -15.681448 -6.4407396]] [[ 7.000556 -10.745702 7.454122 ] [-74.77005 57.814682 24.986063 ] [ 1.997407 4.605535 -13.862865 ] [ 63.033344 -40.308517 -15.54667 ] [ 2.6029735 6.5371037 -8.383862 ] [ 0.13687992 -17.902187 5.351119 ]]
By comparing the forces before and after, we can clearly see the effect of geometric fluctuations of atomic charges on the atomic forces.
In this case, we will demonstrate how to use DMFF in a bottom-up fitting approach.
Navigate to the working directory
The main objective of bottom-up fitting is energies and forces calculated from first-principle calculations. The definition of this object function is relatively straightforward. In this example, we will focus on fitting the Pauli Exchange interaction between two PEG dimers.
The intermolecular exchange repulsion term for this system can be written in the following form:
In this fitting process, the target is the exchange energy calculated using the SAPT (Symmetry-Adapted Perturbation Theory) first-principle method. Based on past experience, there is a strong correlation between the prefactor()and exponents()in this fitting, which can lead to overfitting and yield unrealistic parameters. Here, we aim to utilize the ADAM algorithm in machine learning for optimization, with the hope of obtaining physically reasonable parameters. The system consists of five atom types, which are defined in the following order according to the XML file:
C1, H1, O, C2, H2
All atomic parameters (&) will also be arranged in the same order as mentioned above.
And obtain the corresponding potential functions:
Note that in forcefield.xml
, there is a complete force field that contains many components. Here, we are only interested in the exchange part, so we extract the relevant potential function from it.
In the subsequent calculation of the loss function, we will process minibatches of structures as input. To speed up the computation, we can use jax.vmap
to vectorize the single potential calculation function into a batched energy calculation function.
During this process, we specify vectorization along the first axis of the first variable (positions
).
3.3 Preparing inputs for the potential function (such as neighbor lists, etc)
Next, we prepare the inputs required for these potential functions, with the most important one, the neighbor list. For a simple system with only two molecules, we can set rc
to a relatively large value to include all possible interactions. This way, we don't need to update the neighbor list due to structural changes, which will speed up the calculations.
First, we obtain the initial structure and box size:
(WARNING: All input functions in the ADMP module use Angstrom as the unit, which is different from the classical module where the default unit is nm.)
Then, construct a list of all atom pairs using the initial structure:
Next, obtain the initial parameters as the starting point for optimization:
The dataset is obtained from first-principle (SAPT) calculations and consists of 50 dimer scans. Each scan is identified by a unique scan ID and contains 12 structures. We can take a look at the storage structure of the dataset:
dict_keys(['180', '240', '320', '200', '220', '480', '000', '020', '040', '060', '080', '100', '120', '500', '140', '160', '260', '280', '300', '340', '360', '380', '400', '420', '440', '460', '520', '540', '560', '580', '600', '620', '640', '660', '680', '700', '720', '740', '760', '780', '800', '820', '840', '860', '880', '900', '920', '940', '960', '980'])
We only use the exchange energy data from the dataset.
[1.39466427e+02 8.08116248e+01 4.56703077e+01 2.52853430e+01 1.37681072e+01 3.93215704e+00 1.08843093e+00 2.95647950e-01 7.99345500e-02 1.13576600e-02 1.64950000e-03 2.42900000e-04]
(12, 16, 3)
In the subsequent fitting process, we will shuffle the order of all scans and use each scan's data as a minibatch input. We will use ADAM to train the parameters.
To start our optimization loop, we can iterate over the data for a specified number of epochs and update the parameters using the optimizer. After each epoch, we can save the parameters as a pickle file.
100%|██████████| 50/50 [03:42<00:00, 4.45s/it]
We will first perform 50 epochs of optimization, but of course, we can continue to optimize further to achieve better results. We can then examine whether the optimized parameters are physically reasonable.
[30.301695 37.96646 29.424974 29.878721 38.61278 ]
Finally, we can evaluate the fitting performance:
100%|██████████| 50/50 [00:09<00:00, 5.10it/s]
4.1 Introduction
The initial step in building a workflow using DMFF is to generate the potential function. The process and API are similar to OpenMM and generally follow the following steps:
- The first step in building a force field using DMFF is to define the force field object, known as the
Hamiltonian
. It encapsulates thetypification
rules and all the force field parameters. It can be seen as the equivalent of an XML file in OpenMM or an ITP file in GROMACS, but represented as a Python object. - Obtain the molecular bond connectivity information for the system you want to simulate (you can use a PDB file).
- Based on the system's connectivity topology, you can use the
Hamiltonian
to classify and parameterize each atom, bond length, bond angle, dihedral angle, etc., and create a potential function. The function takes inputs such as atomic positions, box size, and force field parameters, making it suitable for subsequent derivative calculations. - If necessary, the potential function can be modified and extended as needed. For example, using
jax.grad
, one can obtain a function to compute forces and the Virial matrix, which can be directly used in molecular dynamics (MD) simulations. Estimator
and loss functions can be defined based on the potential function to compute desired properties. The optimization of parameters can be performed using a gradient descent optimizer.
In DMFF, we provide JAX implementations of various potential functions commonly used in traditional physics force fields, such as PME, pairwise interactions, multipole moments, and polarizable force fields. With the help of JAX's transformation tools, users can differentiate, vectorize, compile, and modify these potential functions. They can also create new potential functions by re-encapsulating and recombining existing ones. Additionally, users can develop property estimators
based on these potential functions, and these new functions can also be further differentiated or vectorized. This function-centric programming paradigm in DMFF offers unbelievable flexibility in defining new force fields and objective functions.
To provide users with a comprehensive understanding of the main modules and functionalities in DMFF and how these modules work together in force field development, we will demonstrate the complete process of model definition, optimization, and deployment using a coarse-grained model of dimethyl carbonate (DMC). The goal of this task is to optimize the interactions between three sites representing the DMC chain (3-site model) and reproduce the radial distribution function (RDF) of liquid DMC obtained from X-ray diffraction (XRD) experiments. We will also compare the DMC coarse-grained model force field with the all-atom OPLS-AA force field.
Navigate to the working directory
[ 0.2841 -0.5682]
From the prm.xml
force field file, we can see that the function includes the following terms for intramolecular atomic interactions: bond, angle, and electrostatics, as well as intermolecular interactions represented by the Lennard-Jones potential:
And we will optimize some parameters.
The function generates an NPT ensemble using OpenMM based on the input parameters and saves the generated samples in the specified trajectory file. This function will be repeatedly called for resampling purposes during subsequent optimization steps.
#"Step","Density (g/mL)","Speed (ns/day)","Time Remaining" 20000,0.6262534623308863,0,-- 40000,0.5979051832690799,679,0:58 60000,0.6367384763452368,678,0:56 80000,0.6139587399937261,679,0:53 100000,0.6151387279860191,679,0:50 120000,0.5989298333138645,679,0:48 140000,0.6335292723148818,679,0:45 160000,0.6343532933548159,679,0:43 180000,0.6214166011781163,679,0:40 200000,0.6382065774390001,680,0:38 220000,0.6090503783169839,680,0:35 240000,0.6244315319426381,680,0:33 260000,0.613327148658729,680,0:30 280000,0.6114154616377299,680,0:27 300000,0.6164005785043938,680,0:25 320000,0.6360197331658057,680,0:22 340000,0.6158857376058046,680,0:20 360000,0.633164935263684,680,0:17 380000,0.6254281508591532,680,0:15 400000,0.6268376781117729,680,0:12 420000,0.6313746404948491,680,0:10 440000,0.6215903100117814,680,0:07 460000,0.6165084630785347,680,0:05 480000,0.6246335708795673,681,0:02 500000,0.6114394931719171,681,0:00
When calculating the ensemble average of an observable quantity , the following statistical quantities need to be computed: , code above has defined the function.
In general, we need to encapsulate in a function and then take its overall derivative during each optimization iteration. However, in this case, since the RDF and density are purely structural properties that are independent of and do not participate in the derivative calculation, we can precompute the RDF for each sample () and store it. It is not necessary to recalculate it in every iteration.
In the code above, we use the existing tools in mdtraj
to compute the RDF for each frame, rather than using a differentiable implementation with jax.。
We will compare the RDF results obtained from the RDF calculation function with the experimental data and the RDF results from the full-atom model in the OPLS-AA force field. Let's start by looking at the RDF generated from the initial parameter sampling, which serves as our starting point for optimization.
Up to now, we have completed the preparation for force field optimization. Before starting the optimization loop, we need to prepare some few tools. Here, we introduce the distinctive feature of DMFF version 0.2.0 - Property Estimator.
The computational cost of computing deep computational graphs that span the entire trajectory in molecular dynamics simulations is significantly high in terms of both time and computation. However, this drawback can be mitigated by a trajectory reweighting approach. In DMFF, the reweighting algorithm is introduced in the MBAR method, which extends the differentiable estimators for ensemble averages and free energies.
In the gradient descent training of parameter optimization, the parameters are only perturbed slightly in each training epoch, allowing us to continue using the samples from the previous epoch without significant deviation from the sampling ensemble. This eliminates the need for re-sampling until the optimized ensemble significantly deviates from the sampling ensemble, resulting in a substantial reduction in optimization time and computational cost.
In the reweighted MBAR Estimator, we define two ensembles: the sampling ensemble from which all samples are extracted, and the target ensemble that we want to optimize. The sampling ensemble is updated only when necessary and does not require differentiability. Its data can be generated by external samplers such as OpenMM. The Estimator provides weighting factors, allowing us to estimate the target ensemble by the weighted average of samples from the sampling ensemble.
The workflow can be understood using the following flowchart:
The Effective Sample Size is a measure of the degree to which the sampling ensemble deviates from the target ensemble. When the deviation becomes too large, the sampling ensemble is updated.
WARNING:root:Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Here we create MBAREstimator
and initialize it. There are two parts of variables that need to be initialized in MBAREstimator
:
- all sampling state information(
state
)and samples (sample
) ; - The partition function of the sampling state, estimated based on the samples and sampling state information. ( )
It can be observed that in the above code, we define an OpenMMSampleState
and add this state
and the samples generated from this state
to the estimator
. Then, the optimize_mbar
function is called (which internally calls the external pymbar
tool) to estimate the partition function of all the sampling states. It is important to note that these operations do not involve differentiation and do not depend on JAX. After initialization, the estimator
can directly provide in a differentiable manner.
Here, we define our target ensemble target_state
using the differentiable 3-site model potential function efunc
that was defined above. This target_state
is used as an input parameter for subsequent reweighting
. In the computation process, the estimator
will repeatedly call the efunc
from the target_state
to calculate the effective energy of the samples in the target ensemble. For the NVT ensemble, it is calculated as , and for the NPT ensemble, it is calculated as .
Here, we use the estimator.estimate_weight
function to generate the corresponding to the current target_state
. We then calculate the average RDF value under the current target_state
. Finally, we compare the calculated RDF with the experimental reference values to obtain the RDF object function:
At the same time, we have noticed that when optimizing RDF alone, the optimizer may explore non-physical parameter regions, leading to a decrease in system density and eventually crossing the phase boundary to become gas. To avoid this phenomenon, we also include density error in the loss function to control the system density. Therefore, we separately define the loss function for density:
In the following optimization loop, we will first perform several rounds of optimization for density and then proceed with the RDF.
As an old saying goes in China, to do a good job, one must first sharpen his tools (工欲善其事,必先利其器). In the era of rapid development of differentiable programming techniques driven by the wave of deep learning, we have witnessed a new paradigm in force field development. We aim to transform force field development into an engineering-oriented, automated, and reproducible process, allowing us to reap the benefits of continuous integration/development and the open-source spirit. DMFF, along with other related projects in DeepModeling community, will promote and implement this transformative change.
DMFF is currently in the early stages of rapid iteration and development. There are many areas that require further improvement and numerous possibilities worth exploring:
DMFF, along with projects like dflow, aims to implement common force field fitting workflows, such as fitting dihedral angles, free energies, probability distributions, and more. These workflows will be integrated into the software to provide users with a comprehensive suite of tools for force field development and parameterization.
DMFF is committed to continuous development and will strive to meet the evolving needs of the scientific community. This includes expanding the range of supported force field function forms to accommodate a wider variety of molecular systems.
DMFF recognizes the importance of integrating advanced molecular dynamics algorithms with force field development. By leveraging state-of-the-art sampling techniques and enhanced sampling methods, DMFF aims to enhance the accuracy and efficiency of force field optimization and simulations.
Absolutely! DMFF recognizes the importance of user experience and aims to continuously improve its documentation, API, and overall usability. Providing clear and comprehensive documentation is essential for users to understand the functionality and usage of DMFF effectively.
Welcome to write Issues, initiate Discussions, and even submit Pull Requests in the DMFF GitHub project. Specifically:
If you are a hardcore developer in the field of molecular force fields and are exploring new forms of force field functions, we welcome you to engage in in-depth discussions with the developers and contribute to enriching the force field calculation capabilities of DMFF.
If you are dedicated to simulating a specific system and are struggling to find suitable force field parameters, you can become an angel user of DMFF. You can use DMFF to build a force field optimization workflow based on your own needs and provide valuable suggestions to us based on your practical requirements. Your feedback will contribute to the further development and improvement of DMFF.