Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Getting started with DMFF: A comprehensive beginner's tutorial
DMFF
English
MD
Tutorial
DMFFEnglishMD Tutorial
Feng Wei
yuk@dp.tech
发布于 2023-09-27
推荐镜像 :Third-party software:dmff:0.2.0-notebook
推荐机型 :c4_m8_cpu
1
Getting started with DMFF: A comprehensive beginner's tutorial.
Before You Run
What Is Molecular Force Field
Automatic DIfferentiation & DMFF
Table of Contents:
1. Quick Start Guide to DMFF
1.0 Import dependencies and prepare files
1.1 Load existing force field parameters and topology | OpenMM Frontend
1.2 Calculation | JAX Differentiable Backend
1.3 [Review] Basic interface and key usage points of DMFF
1.3.1 Applications of DMFF:
1.3.2 General Operations
Generating Potential Functions
Calculating System's Energy and Forces
1.3.3 [More] Why do we need "differentiable" JAX: The first step towards infinite possibilities for new force fields.
1.4 Code
2. Generating and extending potential functions
2.1 Multipolar Polarizable Force Field
2.2 Extending the potential function using backend functions
3. Bottom-Up Fitting
3.1 Problem introduction:
3.2 Definition of potential function
3.3 Preparing inputs for the potential function (such as neighbor lists, etc)
3.4 Load the data
3.5 Definition of loss function
3.6 Optimize the energy
4.Using DMFF for force field optimization of liquid dimethyl carbonate (DMC)
4.1 Introduction
4.2 Definition of potential function
4.3 Definition of OpenMM sampler
4.4 Initial MD sampling
4.5 Definition of property calculation functions
4.6 Read the data and perform the comparison
4.7 Estimator initialization
4.8 Definition of target ensemble
4.9 Definition of loss function
4.10 Optimizer setup and optimization loop
5. Summary & Outlook

Getting started with DMFF: A comprehensive beginner's tutorial.

In the simulation of molecular systems, the underlying force field (FF) model plays an extremely important role, determining the reliability of the simulation. However, the quality of the state-of-the-art molecular force fields is still unsatisfactory in many cases, and the FF parameterization process largely relies on human experience, which is not scalable. To address this issue, we introduce DMFF (Differentiable Molecular Force Field), an open-source molecular FF development platform based on automatic differentiation technique. Using DMFF, both energies/forces and thermodynamic quantities such as ensemble averages and free energies can be evaluated in a differentiable way, realizing an automatic, yet highly flexible force field optimization workflow

For more details, please refer to this link: https://mp.weixin.qq.com/s/eVXTr1eU1-dGbC5UFybr0g

Note: JAX, DMFF, and other related tools are still undergoing rapid development and iteration. This notebook has been successfully executed using DMFF 0.2.0.

Choose dmff:0.2.0-notebook and you can directly run this notebook on Bohrium

This notebook is primarily organized by Wei Feng.

代码
文本

Before You Run

代码
文本

Molecular systems are a highly important category of systems, encompassing various organic molecules such as biomolecules, drug molecules, various coatings used in industries, porous materials like COFs, polymers and small molecule electrolytes in batteries, and more. Problems of interest in industries, such as structure prediction of biomolecules, drug screening, screening and design of materials, heavily rely on molecular dynamics (MD) simulations of molecular systems. Although the applications of MD have extended beyond molecular systems, simulating molecular systems remains one of the core applications of MD.

代码
文本

MD simulations usually rely on a potential energy surface (PES), which describes the energies and the forces of the simulated atoms. For the sake of computational efficiency, the PES is typically approximated using a classical model, namely molecular force field, instead of being computed ab initioly on-the-fly. Therefore, the quality of the underlying force field limits the accuracy and the predictive power of the simulation.

So it's obvious that the core of simulation is fitting microscale interactions, or potential functions, to accurately predict macroscopic properties. In classical molecular dynamics, the potential function is expressed mathematically in terms of a molecular force field.

代码
文本

What Is Molecular Force Field

代码
文本

As we mentioned above, one of the most important aspects in simulating molecular systems is characterizing the system's potential energy function and describing the interactions within the system. In Classical Molecular Dynamics (CMD) simulations, the potential energy function follows a fixed mathematical form, and the force field provides the mathematical functions for intermolecular and intramolecular interaction potentials. The traditional force fields commonly used in the industry, such as OPLS and GAFF, generally share a similar form:

s

The total energy can be divided into bonded and nonbonded components. The bonded component naturally depends on the intramolecular coordinates (bond lengths, bond angles, dihedral angles), while the nonbonded component (further decomposable into van der Waals interactions and electrostatic interactions) naturally depends on the interatomic distances. This form is almost universally used as the standard form in the entire industry. However, the molecular force fields currently employed in industrial applications face several challenges:

  • Lack of portability and predictive capability: When studying new systems, it is often uncertain which force field will yield better results until actual simulations are performed. Apart from relying on "experience," it can be challenging to determine the appropriate force field selection criteria.
  • Parameter's non-uniqueness and inconsistency: It is common to encounter multiple sets of completely different parameters that yield similar macroscopic predictions. As a result, it is not possible to solely rely on macroscopic data to determine which set of parameters is more reasonable at the microscale. Moreover, similar molecular systems often have entirely different force field parameters, making it challenging to combine force fields developed by different research groups.

Over the past few decades, the development of empirical force fields and the improvement of computational accuracy have been both an age-old and emerging topic. The quest to accurately and efficiently describe atomic interactions and establish a technical roadmap for developing a molecular force field applicable to the majority of systems has been the focus of researchers in the field of molecular force fields and molecular dynamics.

The optimization of parameters in force fields has long relied on manual intervention and "empirical" parameter tuning methods. This reliance raises concerns about the reliability and efficiency of empirical force field fitting. However, with the advent of the artificial intelligence era, one underlying technology, automatic differentiation, offers a new solution. This technology has paved the way for the development of Differentiable Molecular Force Field (DMFF), which holds the promise of becoming a powerful tool for molecular force field developers.

代码
文本

Automatic DIfferentiation & DMFF

Automatic differentiation is a high-precision and versatile method for computing derivatives in computer programs. It follows the mathematical chain rule to compute derivatives of composite functions. By tracing the derivative chains of every data point based on the computation graph, it can calculate the differentiation of the output with respect to the input variables.

Automatic differentiation plays a crucial role in optimizing neural network models. During model training, it is necessary to compute the gradients of the output with respect to the input variables through backpropagation and utilize gradient descent to optimize the model parameters. Automatic differentiation frameworks excel at achieving this task. With their efficient optimization capabilities for high-dimensional parameters, automatic differentiation techniques can be applied not only to neural network models but also to any framework that follows the "input model parameters → apply model computation → obtain computed results" structure. Molecular dynamics simulations precisely follow this workflow.

Therefore, leveraging automatic differentiation techniques and utilizing experimental or first-principles computed data as references, it is possible to optimize force field parameters by calculating the differentiation of the output results with respect to the input parameters through backpropagation. This enables the optimization of force field parameters in molecular dynamics simulations, just as in the optimization of neural network models.

s

Several attempts have been made to harness automatic differentiation techniques for molecular dynamics simulations, such as TorchMD, JAX-MD, and SPONGE. However, the deep computational graph involved in molecular dynamics simulations often introduces additional challenges. The backpropagation process, which calculates the differentiation from the output results to the input variables, can be computationally expensive. Additionally, there is still a lack of simulation engines specifically designed for rapid implementation and parameter optimization of force fields. Molecular force field developers urgently require comprehensive support for a wider range of force field functional forms and various types of objective functions. Now DMFF comes.

s

DMFF provides a comprehensive and rapid implementation of force field models and offers differentiable estimators of system energy, forces, and thermodynamic quantities. These differentiable estimators allow for the definition of corresponding object functions, enabling an automatic optimization process.

As mentioned above, the deep computational graph spanning the entire trajectory in molecular dynamics simulations would be computationally expensive and time-consuming. However, this limitation can be mitigated by employing a reweighting scheme for the trajectory. In DMFF, the reweighting algorithm is incorporated into the MBAR method, and extends the differentiable estimators for average properties and free energy calculations. We will demonstrate this method with practical examples to showcase its effectiveness.

DMFF gives a solution to two major problems in molecular force field development:

  • How to ensure the rapid implementation and iteration of complex force fields in molecular dynamics simulations?
  • How to improve the efficiency of optimizing parameters in high-dimensional functions and automate this process, while increasing the transferability of the parameters?

In our latest release, DMFF 0.2.0, we have explored a pathway for automatic optimization of force field parameters. The corresponding workflow has been validated in both simple small molecular systems and more complex electrolyte systems. We will guide you through a step-by-step notebook experience, demonstrating the complete workflow of DMFF for automatic optimization of force field parameters. Furthermore, using the notebook as a teaching template, we will provide a real hands-on experience showcasing the impact of DMFF on force field development.

代码
文本

1. Quick Start Guide to DMFF

This case is primarily contributed by Yanbo Han.

1.0 Import dependencies and prepare files

代码
文本

First, we set up the runtime environment and import the potential functions needed.

代码
文本
[1]
#! pip install matplotlib pymbar optax
#! conda install mdtraj -y
! if [ ! -e DMFF ];then git clone https://gitee.com/deepmodeling/DMFF.git;fi
! git config --global --add safe.directory `pwd`/DMFF
! cd DMFF && git checkout 0.2.1
! export XLA_PYTHON_CLIENT_PREALLOCATE=FALSE
#! cd DMFF && python setup.py install
Already on 'devel'
Your branch is up to date with 'origin/devel'.
代码
文本

In addition to DMFF, we also need to use JAX and OpenMM.

  • OpenMM: Manages the force field files and parameter data
  • JAX:framework of differentiation

At the same time, we will also use other libraries and trajectory analysis software such as mdtraj in the subsequent examples. Let's import them as well.

代码
文本
[2]
import sys
import numpy as np
import jax
import jax_md
import jax.numpy as jnp
from jax import value_and_grad, jit, vmap
import openmm as mm
import openmm.app as app
import openmm.unit as unit
import dmff
from dmff import Hamiltonian, NeighborList
from dmff.api import Hamiltonian
from dmff.common import nblist
from dmff.optimize import MultiTransform, genOptimizer
from dmff.mbar import MBAREstimator, SampleState, TargetState, Sample, OpenMMSampleState, buildTrajEnergyFunction
import pickle
from pprint import pprint
import optax
import mdtraj as md
from itertools import combinations
import matplotlib.pyplot as plt
WARNING:pymbar.timeseries:Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate.  The estimate presumes the use of many statistically independent samples.  Tests should be performed to assess whether this condition is satisfied.   Be cautious in the interpretation of the data.
代码
文本

Navigate to the working directory

代码
文本
[3]
import os
os.chdir(os.path.join("DMFF","examples", "classical"))
代码
文本

1.1 Load existing force field parameters and topology | OpenMM Frontend

If you are using a force field in OpenMM, you can load the force field parameters using the ForceField class, like:

app.Topology.loadBondDefinitions("lig-top.xml")
pdb = app.PDBFile('lig.pdb')
pdb_system = pdb.topology

forcefield = ForceField('someforcefield.xml')

In DMFF, there is a functionality class called Hamiltonian that is similar to OpenMM's ForceField class. It allows you to read force field parameters and define a more generalized system potential energy function. At the same time, it is compatible with reading existing force field parameters.

  • We can use OpenMM to read PDB and topology files:
    • topology(log-top.xml
    • PDB(lig.pdb
  • Load the force field parameters using DMFF's Hamiltonian class and create a differentiable potential energy function:
    • GAFF force field file (gaff-2.11.xml
    • Additional parameters for the corresponding molecule (assigned charges) (lig-prm.xml

The DMFF potential function, apart from the name "Hamiltonian," is used in a similar way to OpenMM. The XML files of OpenMM force fields can also be directly reused.

代码
文本
[4]
app.Topology.loadBondDefinitions("lig-top.xml")
pdb = app.PDBFile("lig.pdb")
ff = Hamiltonian("gaff-2.11.xml", "lig-prm.xml")
potentials = ff.createPotential(pdb.topology)
代码
文本

In DMFF, the parameters and calculations of the potential function are managed by JAX. For example, the DMFF potential function includes the re-implemented HarmonicBondForce, HarmonicAngleForce, PeriodicTorsionForce, and NonbondedForce. The parameters in DMFF are JAX-wrapped Array objects. For example, if we define:

jnp.array([1.])

You will obtain a DeviceArray type. This type has a similar interface to numpy.ndarray(), as both are high-performance arrays/matrices. However, unlike numpy, JAX's jax.numpy.array() can be extended to work with GPUs/TPUs, making it suitable for efficient automatic differentiation.

代码
文本
[5]
for k in potentials.dmff_potentials.keys():
pot = potentials.dmff_potentials[k]
print(k, pot)

params = ff.getParameters()
# print(params.keys()) # there are 4 kinds of Forces,'NonbondedForce', 'HarmonicBondForce', 'HarmonicAngleForce', 'PeriodicTorsionForce'
nbparam = params['NonbondedForce']
for k,v in nbparam.items():
print(k, type(v), v if v.shape[0]<10 else f"shape: {v.shape}")
HarmonicBondForce <function HarmonicBondJaxGenerator.createForce.<locals>.potential_fn at 0x7f330c3d77f0>
HarmonicAngleForce <function HarmonicAngleJaxGenerator.createForce.<locals>.potential_fn at 0x7f330c3d7c70>
PeriodicTorsionForce <function PeriodicTorsionJaxGenerator.createForce.<locals>.potential_fn at 0x7f330c3d7d00>
NonbondedForce <function NonbondedJaxGenerator.createForce.<locals>.potential_fn at 0x7f32ec5817e0>
sigfix <class 'jaxlib.xla_extension.DeviceArray'> []
epsfix <class 'jaxlib.xla_extension.DeviceArray'> []
sigma <class 'jaxlib.xla_extension.DeviceArray'> shape: (97,)
epsilon <class 'jaxlib.xla_extension.DeviceArray'> shape: (97,)
charge <class 'jaxlib.xla_extension.DeviceArray'> shape: (66,)
coulomb14scale <class 'jaxlib.xla_extension.DeviceArray'> [0.8333333]
lj14scale <class 'jaxlib.xla_extension.DeviceArray'> [0.5]
bcc <class 'jaxlib.xla_extension.DeviceArray'> []
vsite_types <class 'jaxlib.xla_extension.DeviceArray'> []
vsite_distances <class 'jaxlib.xla_extension.DeviceArray'> []
代码
文本

1.2 Calculation | JAX Differentiable Backend

代码
文本

In the calculation of the defined potential functions, we require the following parameters:

  • coordinates: just use the coordinates in PDB file

  • box: We need to add the definition of the box since our PDB file does not contain this information. However, it is worth mentioning that if we use None for the box size, we can still obtain results because our system does not involve periodic boundaries.

  • pairs: The GAFF2 potential function requires the input of NeighborList to calculate nonbonded forces. Therefore, we can use the NeighborList class to obtain the pairs for the energy calculation.

Then, we can pass these to the get_energy function, which is generated by the generator from the parsed XML force field parameters in potentials.dmff_potentials. For example, we can calculate the energy associated with the NonbondedForce interactions as follows:

代码
文本
[6]
positions = jnp.array(pdb.getPositions(asNumpy=True).value_in_unit(unit.nanometer))

box = jnp.array([
[10.0, 0.0, 0.0],
[ 0.0, 10.0, 0.0],
[ 0.0, 0.0, 10.0]
])
# box=None # you can also use this

nbList = NeighborList(box, r_cutoff=4, covalent_map=ff.getGenerators()[-1].covalent_map)
nbList.allocate(positions)
pairs = nbList.pairs

# The format of pairs is [atom_index1, atom_index2, nbond], where nbond is 0 if there is no bond between the atoms.
# print(pairs)

nbfunc = potentials.dmff_potentials['NonbondedForce']

# You can use inspect to examine the nbfunc function. The inspect.signature() method will tell us the input parameters of this function.
# import inspect
# print(inspect.signature(nbfunc))

nbene = nbfunc(positions, box, pairs, params)
print(nbene)
-425.40488
代码
文本

For the system we defined above, the total energy can be calculated as follows:

To calculate the total energy using the previously defined potential, you can use the getPotentialFunc() method, which will return the function for calculating the total energy.

代码
文本
[7]
efunc = potentials.getPotentialFunc()
params = ff.getParameters()
totene = efunc(positions, box, pairs, params)
print(totene)
-52.358948
代码
文本

So far, the methods we have used to calculate the system energy are similar to those in OpenMM.

代码
文本

The main advantage of using JAX as a computational backend is that we can use the jax.grad function to obtain the gradients of a function. Its syntax is jax.grad(func, argnums), where it calculates the (partial) derivatives of the function with respect to the argument specified by argnums.

The interface of the total energy calculation function efunc that we obtained is [coordinates, box, bond pairs, force field parameters].

代码
文本
[8]
# you can use inspect.signature to view the function interface signature
# import inspect
# print(inspect.signature(efunc))
代码
文本

By applying the "differentiation of the function" operation, specifically taking the partial derivatives of the total energy with respect to the coordinates, we can compute the partial derivatives of the total energy with respect to the coordinates. These derivatives can then be used to calculate the forces acting on the atoms in the molecule.

代码
文本
[9]
pos_grad_func = jax.grad(efunc, argnums=0)
force = -pos_grad_func(positions, box, pairs, params)
print(force)
[[  803.5053     3400.27       -661.98413  ]
 [ 1150.3483    -2973.8022     4723.1094   ]
 [-1806.3174    -1691.2512    -4400.671    ]
 [-1975.181     -3796.7463    -2135.0186   ]
 [  300.95966   -2505.3462     1973.2803   ]
 [ 1417.9027     4929.0312      690.1204   ]
 [-1214.6167     -752.06226    -728.6687   ]
 [  506.18085   -1762.1459     -183.49246  ]
 [  666.4907      -96.466      1381.9845   ]
 [-1764.406     -1823.6426    -1339.2583   ]
 [  716.9508     2224.645      -611.07526  ]
 [ -668.10223   -2633.5884      -45.521576 ]
 [  525.2708     1892.7621     -324.64764  ]
 [ -222.92917   -1252.5643     1038.3877   ]
 [ 1050.3809     1406.0747      859.39716  ]
 [ -150.03757    -816.84314    -112.414795 ]
 [-2460.543      1360.3517     -796.9952   ]
 [-1690.8473    -4545.091        67.75354  ]
 [ 2421.7532     2531.319       411.40106  ]
 [-1881.5691     2700.4482     -938.7882   ]
 [ 2230.6426    -2105.4688      996.7544   ]
 [ 2820.1313     3725.3066      531.66956  ]
 [ 1008.5734    -1133.9412     -123.15384  ]
 [ -276.46887     160.24274    -402.34088  ]
 [  119.170166    482.1291     -187.77916  ]
 [  350.39615   -1206.4073       97.62477  ]
 [ -646.07837     625.33057     662.3565   ]
 [ -848.41907     793.5155    -1179.742    ]
 [ -110.67904    -297.4613     1608.6833   ]
 [  907.37463    -712.69055    -804.76746  ]
 [  -23.865623   -403.59476     221.90463  ]
 [ -974.9644     1265.961        34.98752  ]
 [ 1254.7183      408.12585     160.66191  ]
 [ -523.4277      797.085     -1009.95105  ]
 [  384.23688    -426.2561      378.51938  ]
 [ -935.60455     511.5661    -1090.4733   ]
 [ -149.94562    -666.4287     1039.4056   ]
 [  397.46478    -338.41284     339.30286  ]
 [  414.70288     479.50735     767.1015   ]
 [  965.32336     160.06296      13.4783325]
 [ -527.07196     650.70605     441.80896  ]
 [  317.2796      998.7295      -60.578827 ]
 [ -273.489       918.78033    -523.65106  ]
 [ 1251.1185      930.4213      445.95312  ]
 [ -647.67566   -1616.5327      -86.54979  ]
 [-1666.081      -537.21234    -452.2054   ]
 [ -638.9363     -822.2604     -414.4256   ]
 [  751.17883     454.1977      783.03796  ]
 [  428.46597    -256.60504    -690.38293  ]
 [ -603.1392      231.0011     -406.5716   ]
 [ -831.8475     -148.27419    -253.46361  ]
 [ -156.61652     425.78406     815.5218   ]
 [ -181.66884     242.79425      13.229111 ]
 [ -527.74664    -599.4667     -484.45233  ]
 [ -955.4458     -160.5049      388.17126  ]
 [ -212.86476    -710.949       291.9231   ]
 [   34.802746    149.08188     746.7626   ]
 [  -35.14848    -257.8299     -725.77954  ]
 [  -63.89473     318.1174      679.2535   ]
 [  737.6649       43.31356    -182.68419  ]
 [  723.8848      492.97958    -177.76422  ]
 [  326.529       -76.563324   -903.23193  ]
 [    8.246334    342.50415     700.3305   ]
 [  721.10767      20.561882   -288.5298   ]
 [ -195.67831    -755.8612       93.2759   ]
 [  128.55147    1809.5626     -670.13806  ]]
代码
文本

Similarly, we can also take derivatives with respect to the force field parameters. For example, if we want to optimize the parameters related to the nonbonded interactions, we can adjust the argnum parameter accordingly. By doing so, we can obtain the derivative of the nonbonded energy with respect to the charge parameter.

代码
文本
[10]
param_grad_func = jax.grad(nbfunc, argnums=3, allow_int=True)
pgrad = param_grad_func(positions, box, pairs, params)
print(pgrad["NonbondedForce"]["charge"])
[ 652.77527     55.10861    729.3611    -171.49307    502.70825
  -44.917175   129.63982   -142.31807   -149.62085    453.21515
   46.37271    140.15305    575.4879     461.46902    294.4356
  335.25162     27.828522   671.3639     390.8903     519.6834
  220.51123    238.76968    229.97318    210.58838    237.0856
  196.40985    231.87343     35.663544   457.7645      77.47992
  256.54395    402.21216    592.4626     421.86688    -52.09668
  440.84637    611.9574     237.98907    110.28607    150.65381
  218.61093    240.2049    -211.85382    150.73334    310.89423
  208.65222   -139.23056   -168.88818    114.364685     3.7261353
  399.62814    298.2848     422.06445    485.14267    512.12665
  549.8402     556.47235    394.4085     575.85754    606.74725
  526.18463    521.27563    558.5588     560.4667     562.8122
  333.7419   ]
代码
文本

1.3 [Review] Basic interface and key usage points of DMFF

1.3.1 Applications of DMFF:

Molecular Force Field Optimization Platform

  • DMFF is compatible with OpenMM's XML format for molecular force field parameters
  • Automatic differentiation and GPU support enable fast implementation of complex molecular dynamics force fields in DMFF
  • flexible force field parameter optimization capability of DMFF
    • Improving a specific component among them×
    • Complex parameter optimization√ Rapid implementation through automatic differentiation frameworks

1.3.2 General Operations

Generating Potential Functions

  • Defining force parameters in an XML file (OpenMM interface)
  • Description of the system in PDB and topology in XML format

pdb = app.PDBFile("lig.pdb")
ff = Hamiltonian("gaff-2.11.xml", "lig-prm.xml")
potentials = ff.createPotential(pdb.topology)

Calculating System's Energy and Forces


efunc = potentials.getPotentialFunc()
params = ff.getParameters()

# energy function
totene = efunc(positions, box, pairs, params)
print(totene)

# 力
pos_grad_func = jax.grad(efunc, argnums=0)
force = -pos_grad_func(positions, box, pairs, params)
print(force)

1.3.3 [More] Why do we need "differentiable" JAX: The first step towards infinite possibilities for new force fields.

Why JAX?

Here we take the example of the implementation of the HarmonicBondForce in DMFF to build a simple model to illustrate what we can do with the JAX backend of DMFF.

代码
文本

In this part we will use some customized files (modifying these files might help you understand what the DMFF frontend does):

we an create a new file dummy-prm.xml


<ForceField>
    <Residues>
        <Residue name="DUM">
            <Atom name="C1" type="a" />
            <Atom name="C2" type="a" />
            <Bond atomName1="C1" atomName2="C2"/>
        </Residue>
    </Residues>
</ForceField>

dummy-top.xml


<Residues>
    <Residue name='DUM'>
        <Bond from='C1' to='C2'/>
    </Residue>
</Residues>

dummy.pdb


REMARK   1 CREATED WITH MANUAL
HETATM    1  C1  DUM A   1       0.000   0.000   0.000  1.00  0.00           C  
HETATM    2  C2  DUM A   1       1.000   0.000   0.000  1.00  0.00           C  
TER       3      DUM A   1
END


<ForceField>
  <AtomTypes>
    <Type element="C" name="a" class="a" mass="12.01"/>
    </AtomTypes>
  <HarmonicBondForce>
    <Bond type1="a" type2="a" length="0.10" k="100"/>
  </HarmonicBondForce>
</ForceField>

代码
文本
[12]
# from dmff/classical/intra.py
import jax.numpy as jnp
from jax import value_and_grad
from dmff.classical.intra import distance
'''
def distance(p1v, p2v):
return jnp.sqrt(jnp.sum(jnp.power(p1v - p2v, 2), axis=1))

class HarmonicBondJaxForce:
def __init__(self, p1idx, p2idx, prmidx):
self.p1idx = p1idx
self.p2idx = p2idx
self.prmidx = prmidx
self.refresh_calculators()

def generate_get_energy(self):
def get_energy(positions, box, pairs, k, length):
p1 = positions[self.p1idx,:]
p2 = positions[self.p2idx,:]
kprm = k[self.prmidx]
b0prm = length[self.prmidx]
dist = distance(p1, p2)
return jnp.sum(0.5 * kprm * jnp.power(dist - b0prm, 2))

return get_energy

def update_env(self, attr, val):
"""
Update the environment of the calculator
"""
setattr(self, attr, val)
self.refresh_calculators()

def refresh_calculators(self):
"""
refresh the energy and force calculators according to the current environment
"""
self.get_energy = self.generate_get_energy()
self.get_forces = value_and_grad(self.get_energy)
'''
import openmm.app as app
app.Topology.loadBondDefinitions("dummy-top.xml")
pdb = app.PDBFile('dummy.pdb')
ff = Hamiltonian("dummy.xml","dummy-prm.xml")
potentials = ff.createPotential(pdb.topology)
efunc = potentials.getPotentialFunc()
params = ff.getParameters()
totene = efunc(jnp.array([[0,0,0],[0.11,0,0]]), box=None, pairs=[[0,1,1]], params=params)
print("dmff energy:",totene)
pos_grad_func = jax.grad(efunc, argnums=0)
force = -pos_grad_func(jnp.array([[0,0,0],[0.11,0,0]]), box=None, pairs=[[0,1,1]], params=params)
print("dmff force:",force)

def get_energy(positions, box, pairs, k, length):
p1 = positions[[0],:]
p2 = positions[[1],:]
kprm = k[0]
b0prm = length[0]
dist = distance(p1, p2)
return jnp.sum(0.5 * kprm * jnp.power(dist - b0prm, 2))

pos = jnp.array([[0,0,0],[0.11,0,0]])
box = None
pairs = [[0,1,1]]
k = [100]
length = [0.1]

print("func energy:", get_energy(pos,box,pairs,k,length))

pos_grad_func = jax.grad(get_energy, argnums=0)
print("func force:", -pos_grad_func(pos,box,pairs,k,length))

def get_rmse_force(pos,box,pairs,k,length):
return jnp.sum(jnp.power(pos_grad_func(pos,box,pairs,k,length),2))

param_grad_func = jax.grad(get_rmse_force, argnums=4)
print("param grad:", param_grad_func(pos,box,pairs,k,length))
print("param grad, optimized:", param_grad_func(pos,box,pairs,k,[0.11]))
dmff energy: 0.004999998
dmff force: [[ 0.99999976 -0.         -0.        ]
 [-0.99999976 -0.         -0.        ]]
func energy: 0.004999998
func force: [[ 0.99999976 -0.         -0.        ]
 [-0.99999976 -0.         -0.        ]]
param grad: [DeviceArray(-399.99988, dtype=float32, weak_type=True)]
param grad, optimized: [DeviceArray(-0., dtype=float32, weak_type=True)]
代码
文本
  • By taking derivatives with respect to coordinates, we can perform molecular dynamics (MD) simulations.
  • By taking derivatives with respect to the box, we can perform NPT ensemble molecular dynamics (MD) simulations.
  • By taking derivatives with respect to the parameters, we will be able to optimize the parameters of our defined force field using techniques such as gradient descent.
代码
文本

2. Generating and extending potential functions

This case is primarily contributed by Professor Kuang Yu

In the following example, using water as an illustration, we will demonstrate: 1. How to define a potential function using the DMFF frontend; 2. How to extend the backend function of a multipolar polarizable force field to introduce charge fluctuations with respect to the structure.

代码
文本

Navigate to the working directory

代码
文本
[13]
import os
current_directory = os.getcwd()
parent_directory = os.path.dirname(current_directory)
os.chdir(parent_directory)
os.chdir(os.path.join("fluctuated_leading_term_waterff"))
代码
文本

2.1 Multipolar Polarizable Force Field

代码
文本

Let's briefly introduce the concept of a multipolar polarizable force field. In such force fields (e.g., AMOEBA, MPID), each atom is associated with a local coordinate system defined by its neighboring atoms. Within this local coordinate system, the atom can have a permanent multipole moment (). In addition to the permanent multipole moment, each atom also carries an induced dipole moment (), which responds to the electric field from the surrounding water molecules. The energy of the system, given a certain induced dipole moment, can be expressed as:

To compute the actual potential energy surface, it is necessary to adjust the induced dipole moments to minimize the energy:

In the DMFF ADMP module, we provide full support for such potential energy models. We start by writing the corresponding forcefield.xml file:

<ForceField>
 <AtomTypes>
  <Type name="380" class="OW" element="O" mass="15.999"/>
  <Type name="381" class="HW" element="H" mass="1.008"/>
 </AtomTypes>
 <Residues>
  <Residue name="HOH">
   <Atom name="H1" type="381"/>
   <Atom name="H2" type="381"/>
   <Atom name="O" type="380"/>
   <Bond from="0" to="2"/>
   <Bond from="1" to="2"/>
 </Residue>
 </Residues>

 <ADMPPmeForce lmax="2"
     mScale12="0.00" mScale13="0.00" mScale14="0.00" mScale15="1.00" mScale16="1.00"
     pScale12="0.00" pScale13="0.00" pScale14="0.00" pScale15="1.00" pScale16="1.00"
     dScale12="1.00" dScale13="1.00" dScale14="1.00" dScale15="1.00" dScale16="1.00">
   <Atom type="380" kz="381" kx="-381"
             c0="-0.803721"
             dX="0.0" dY="0.0"  dZ="-0.00784325"
             qXX="0.000366476" qXY="0.0" qYY="-0.000381799" qXZ="0.0" qYZ="0.0" qZZ="1.53231e-05"
             />
   <Atom type="381" kz="380" kx="381"
             c0="0.401876"
             dX="-0.00121713" dY="0.0"  dZ="-0.00095895"
             qXX="6.7161e-06" qXY="0.0" qYY="-3.37874e-05" qXZ="1.25905e-05" qYZ="0.0" qZZ="2.70713e-05"
             />
   <Polarize type="380" polarizabilityXX="1.1249e-03" polarizabilityYY="1.1249e-03" polarizabilityZZ="1.1249e-03" thole="0.33"/>
   <Polarize type="381" polarizabilityXX="2.6906e-04" polarizabilityYY="2.6906e-04" polarizabilityZZ="2.6906e-04" thole="0.33"/>
 </ADMPPmeForce>
</ForceField>

As we can see, we have defined two atom types, O(380) and H(381), and defined the connectivity topology template for water molecules. In the <ADMPPmeForce> section, we have defined the charges, dipoles, quadrupoles, and polarizabilities for the two atom types. Next, we create a Hamiltonian and use the molecular topology to build the potential energy surface:

代码
文本
[14]
H = Hamiltonian('forcefield.xml')
app.Topology.loadBondDefinitions("residues.xml")
pdb = app.PDBFile("water_dimer.pdb")
pots = H.createPotential(pdb.topology, nonbondedCutoff=10.0*unit.angstrom, step_pol=5)
efunc = pots.dmff_potentials['ADMPPmeForce']

# and you can jit it
efunc = jit(efunc)
# you can also define the gradient function
gfunc = jit(value_and_grad(efunc, argnums=(0)))
代码
文本

efunc represents the newly generated potential function. It's important to note that the createPotential function returns a Potential object. From this object, you can obtain the total potential energy function using pots.getPotentialFunc(), and you can also access each individual force field term defined in the XML using pots.dmff_potentials.

To use the newly obtained potential function, you can follow this approach:

代码
文本
[15]
# ADMP assumes the unit of Angstrom
positions = jnp.array(pdb.positions._value) * 10
box = jnp.array(pdb.topology.getPeriodicBoxVectors()._value) * 10
# construct neighbor list for water dimer
nbl = nblist.NeighborListFreud(box, 10.0, pots.meta['cov_map'])
pairs = nbl.allocate(positions, box)
params = H.getParameters()

# run the function
E, g0 = gfunc(positions, box, pairs, params)
print(E)
print(g0)
-44.247986
[[ 26.789263  -28.855604  -23.824251 ]
 [-89.97108    71.58078    45.55526  ]
 [ -6.398665   12.705884   -2.0030034]
 [ 99.174255  -49.363472    7.53055  ]
 [-13.098161    9.614633  -20.819822 ]
 [-16.494581  -15.681494   -6.440734 ]]
代码
文本

Due to the JIT compilation, the first call may be slower, but the subsequent calls will be significantly faster.

代码
文本
[16]
E, g0 = gfunc(positions, box, pairs, params)
print(E)
-44.247986
代码
文本

Up to this point, we have completed the definition of the potential function in the frontend. The definition and usage of other potential functions are similar. Next, let's take a look at how we can extend this potential function.

代码
文本

2.2 Extending the potential function using backend functions

代码
文本

In this case, we aim to introduce charge fluctuations into the existing multipolar polarizable force field. In the previous model, the permanent multipoles of all atoms were considered constant. However, we observe that the charges of water molecules vary with the changes in bond lengths and bond angles due to molecular vibrations. In other words, atomic parameters, including charges and dispersion coefficients, should ideally be functions of the molecular geometry. Although the induced dipoles from other molecules are already considered in the potential function, the charge fluctuations resulting from the structural changes of the molecule itself are not taken into account. To incorporate this into our model, we can define a new potential function based on the existing one.

First and foremost, we need to acknowledge that the previously defined frontend function is not suitable for such extensions. This is because the efunc function's input is based on force field parameters rather than atomic parameters. We can examine the charge values in the params variable within the efunc function's input:

代码
文本
[17]
# print charges
print(params['ADMPPmeForce']['Q_local'][:, 0])
[-0.803721  0.401876]
代码
文本

It is evident that efunc and gfunc only accept charges for the O and H atoms, and they do not allow specifying individual charges for each atom (i.e., by default assigning the same charge to all O atoms). This is a characteristic of the DMFF frontend functions: they handle the typification process internally and do not expose per-atom parameters as inputs. However, in our example, we want to specify charges for each atom individually. Therefore, we need to call the backend functions, which are typically stored in the generator object:

代码
文本
[18]
# fetch generator
pme_generator = H.getGenerators()[-1]
# fetch backend function
get_energy = jit(pme_generator.pme_force.get_energy)
help(get_energy)
Help on CompiledFunction in module dmff.admp.pme:

get_energy(positions, box, pairs, Q_local, pol, tholes, mScales, pScales, dScales, U_init=DeviceArray([[0., 0., 0.],
             [0., 0., 0.],
             [0., 0., 0.],
             [0., 0., 0.],
             [0., 0., 0.],
             [0., 0., 0.]], dtype=float32))

代码
文本

It can be seen that the backend function's API design is more complex. This function is not responsible for typification and accepts the local multipole moments (Q_local), polarizability (pol), and the thole damping width (thole) for each atom as inputs. Therefore, it offers greater flexibility for extension. Additionally, the generator object also contains other information that we will need later, such as the atom_type that records the atom types for each atom (we will use this variable for expanding the force field parameters):

代码
文本
[19]
print(pme_generator.map_atomtype)
[0 1 1 0 1 1]
代码
文本

We would like to implement the following computation process:

s

That is, introduce structural dependence into Q based on the positions:

To do this, let's define a function that calculates the charges based on the molecular structure:():

代码
文本
[20]
# compute charges from bond lengths and angle
from dmff.admp.spatial import v_pbc_shift

def compute_charges(positions,box):
n_atoms = len(positions)
c0 = jnp.zeros(n_atoms)
box_inv = jnp.linalg.inv(box)
O = positions[::3]
H1 = positions[1::3]
H2 = positions[2::3]
ROH1 = H1 - O
ROH2 = H2 - O
ROH1 = v_pbc_shift(ROH1, box, box_inv)
ROH2 = v_pbc_shift(ROH2, box, box_inv)
dROH1 = jnp.linalg.norm(ROH1, axis=1)
dROH2 = jnp.linalg.norm(ROH2, axis=1)
costh = jnp.sum(ROH1 * ROH2, axis=1) / (dROH1 * dROH2)
angle = jnp.arccos(costh)*180/jnp.pi
dipole1 = -0.016858755+0.002287251*angle + 0.239667591*dROH1 + (-0.070483437)*dROH2
charge_H1 = dipole1/dROH1
dipole2 = -0.016858755+0.002287251*angle + 0.239667591*dROH2 + (-0.070483437)*dROH1
charge_H2 = dipole2/dROH2
charge_O = -(charge_H1 + charge_H2)
c0 = c0.at[::3].set(charge_O)
c0 = c0.at[1::3].set(charge_H1)
c0 = c0.at[2::3].set(charge_H2)
return c0
代码
文本

Then, let's define a function generator that combines the compute_charges function and the pme_force.get_energy function:

代码
文本
[21]
def generate_calculator(get_energy, pme_generator):
def potential(positions, box, pairs):
params = pme_generator.paramtree['ADMPPmeForce']
map_atomtype = pme_generator.map_atomtype
# get charges
c0 = compute_charges(positions,box)
Q_local = params["Q_local"][map_atomtype]
# change fixed charges to fluctuated ones
Q_local = Q_local.at[:,0].set(c0)
# prepare other inputs
pol = params["pol"][map_atomtype]
tholes = params["tholes"][map_atomtype]

# all backend kernels
E_pme = pme_generator.pme_force.get_energy(
positions,
box,
pairs,
Q_local,
pol,
tholes,
params["mScales"], params["pScales"], params["dScales"]
)
return E_pme
# return energy calculator
return potential

efunc1 = generate_calculator(get_energy, pme_generator)
gfunc1 = jit(value_and_grad(efunc1, argnums=(0)))
代码
文本

Next, we can calculate the corresponding energy and forces:

代码
文本
[22]
E, g1 = gfunc1(positions, box, pairs)
代码
文本
[23]
print(g0)
print(g1)
[[ 26.789217  -28.855394  -23.824253 ]
 [-89.971054   71.58072    45.555294 ]
 [ -6.3986936  12.705858   -2.0030873]
 [ 99.17415   -49.363495    7.5305967]
 [-13.098191    9.614621  -20.819775 ]
 [-16.494642  -15.681448   -6.4407396]]
[[  7.000556   -10.745702     7.454122  ]
 [-74.77005     57.814682    24.986063  ]
 [  1.997407     4.605535   -13.862865  ]
 [ 63.033344   -40.308517   -15.54667   ]
 [  2.6029735    6.5371037   -8.383862  ]
 [  0.13687992 -17.902187     5.351119  ]]
代码
文本

By comparing the forces before and after, we can clearly see the effect of geometric fluctuations of atomic charges on the atomic forces.

代码
文本

3. Bottom-Up Fitting

This case is primarily contributed by Junmin Chen

代码
文本

In this case, we will demonstrate how to use DMFF in a bottom-up fitting approach.

Navigate to the working directory

代码
文本
[24]
import os
current_directory = os.getcwd()
parent_directory = os.path.dirname(current_directory)
os.chdir(parent_directory)
os.chdir(os.path.join("peg_slater_isa"))
代码
文本

3.1 Problem introduction:

代码
文本

The main objective of bottom-up fitting is energies and forces calculated from first-principle calculations. The definition of this object function is relatively straightforward. In this example, we will focus on fitting the Pauli Exchange interaction between two PEG dimers.

s

The intermolecular exchange repulsion term for this system can be written in the following form:

In this fitting process, the target is the exchange energy calculated using the SAPT (Symmetry-Adapted Perturbation Theory) first-principle method. Based on past experience, there is a strong correlation between the prefactor()and exponents()in this fitting, which can lead to overfitting and yield unrealistic parameters. Here, we aim to utilize the ADAM algorithm in machine learning for optimization, with the hope of obtaining physically reasonable parameters. The system consists of five atom types, which are defined in the following order according to the XML file:

C1, H1, O, C2, H2

All atomic parameters (&) will also be arranged in the same order as mentioned above.

代码
文本

3.2 Definition of potential function

We define the relevant Hamiltonian (we will define different Hamiltonians and potential functions for the dimer and the two monomers separately):

代码
文本
[26]
ff = 'forcefield.xml'
pdb_AB = app.PDBFile('peg2_dimer.pdb')
pdb_A = app.PDBFile('peg2.pdb')
pdb_B = app.PDBFile('peg2.pdb')
H_AB = Hamiltonian(ff)
H_A = Hamiltonian(ff)
H_B = Hamiltonian(ff)
代码
文本

And obtain the corresponding potential functions:

代码
文本
[29]
rc = 15

pots_AB = H_AB.createPotential(pdb_AB.topology, nonbondedCutoff=rc*unit.angstrom, nonbondedMethod=app.CutoffPeriodic, ethresh=1e-4)
pots_A = H_A.createPotential(pdb_A.topology, nonbondedCutoff=rc*unit.angstrom, nonbondedMethod=app.CutoffPeriodic, ethresh=1e-4)
pots_B = H_B.createPotential(pdb_B.topology, nonbondedCutoff=rc*unit.angstrom, nonbondedMethod=app.CutoffPeriodic, ethresh=1e-4)
pot_ex_AB = pots_AB.dmff_potentials['SlaterExForce']
pot_ex_A = pots_A.dmff_potentials['SlaterExForce']
pot_ex_B = pots_B.dmff_potentials['SlaterExForce']
代码
文本

Note that in forcefield.xml, there is a complete force field that contains many components. Here, we are only interested in the exchange part, so we extract the relevant potential function from it.

In the subsequent calculation of the loss function, we will process minibatches of structures as input. To speed up the computation, we can use jax.vmap to vectorize the single potential calculation function into a batched energy calculation function.

代码
文本
[30]
pot_ex_AB_v = jit(vmap(pot_ex_AB, in_axes=(0, None, None, None), out_axes=(0)))
pot_ex_A_v = jit(vmap(pot_ex_A, in_axes=(0, None, None, None), out_axes=(0)))
pot_ex_B_v = jit(vmap(pot_ex_B, in_axes=(0, None, None, None), out_axes=(0)))
代码
文本

During this process, we specify vectorization along the first axis of the first variable (positions).

3.3 Preparing inputs for the potential function (such as neighbor lists, etc)

Next, we prepare the inputs required for these potential functions, with the most important one, the neighbor list. For a simple system with only two molecules, we can set rc to a relatively large value to include all possible interactions. This way, we don't need to update the neighbor list due to structural changes, which will speed up the calculations.

First, we obtain the initial structure and box size:

WARNING: All input functions in the ADMP module use Angstrom as the unit, which is different from the classical module where the default unit is nm.)

代码
文本
[31]
rc = 15

pos_AB0 = jnp.array(pdb_AB.positions._value) * 10
n_atoms = len(pos_AB0)
n_atoms_A = n_atoms // 2
n_atoms_B = n_atoms // 2
pos_A0 = jnp.array(pdb_AB.positions._value[:n_atoms_A]) * 10
pos_B0 = jnp.array(pdb_AB.positions._value[n_atoms_A:n_atoms]) * 10
box = jnp.array(pdb_AB.topology.getPeriodicBoxVectors()._value) * 10
代码
文本

Then, construct a list of all atom pairs using the initial structure:

代码
文本
[32]
nbl_AB = nblist.NeighborList(box, rc, pots_AB.meta['cov_map'])
nbl_AB.allocate(pos_AB0)
pairs_AB = nbl_AB.pairs
nbl_A = nblist.NeighborList(box, rc, pots_A.meta['cov_map'])
nbl_A.allocate(pos_A0)
pairs_A = nbl_A.pairs
nbl_B = nblist.NeighborList(box, rc, pots_B.meta['cov_map'])
nbl_B.allocate(pos_B0)
pairs_B = nbl_B.pairs
代码
文本

Next, obtain the initial parameters as the starting point for optimization:

代码
文本
[33]
params = H_AB.getParameters()
代码
文本

3.4 Load the data

代码
文本
[34]
with open('data_sr.pickle', 'rb') as ifile:
data = pickle.load(ifile)
代码
文本

The dataset is obtained from first-principle (SAPT) calculations and consists of 50 dimer scans. Each scan is identified by a unique scan ID and contains 12 structures. We can take a look at the storage structure of the dataset:

代码
文本
[35]
# print all scan ids (sid)
print(data.keys())
dict_keys(['180', '240', '320', '200', '220', '480', '000', '020', '040', '060', '080', '100', '120', '500', '140', '160', '260', '280', '300', '340', '360', '380', '400', '420', '440', '460', '520', '540', '560', '580', '600', '620', '640', '660', '680', '700', '720', '740', '760', '780', '800', '820', '840', '860', '880', '900', '920', '940', '960', '980'])
代码
文本

We only use the exchange energy data from the dataset.

代码
文本
[36]
# Check out the exchange energy of the first scan
scan_res = data['000']
print(scan_res['ex'])
[1.39466427e+02 8.08116248e+01 4.56703077e+01 2.52853430e+01
 1.37681072e+01 3.93215704e+00 1.08843093e+00 2.95647950e-01
 7.99345500e-02 1.13576600e-02 1.64950000e-03 2.42900000e-04]
代码
文本
[37]
# Check out the structure of this scan
print(scan_res['posA'].shape)
(12, 16, 3)
代码
文本

In the subsequent fitting process, we will shuffle the order of all scans and use each scan's data as a minibatch input. We will use ADAM to train the parameters.

代码
文本

3.5 Definition of loss function

To define the energy loss function for the intermolecular exchange repulsion interaction, we can use the following formulation:

代码
文本
[38]
@jit
def MSELoss(params, scan_res):
'''
The weighted mean squared error loss function
Conducted for each scan
'''

E_ref = scan_res['ex']
E_tot_full = scan_res['tot_full']
kT = 2.494 # 300 K = 2.494 kJ/mol
weights_pts = jnp.piecewise(E_tot_full, [E_tot_full<25, E_tot_full>=25], [lambda x: jnp.array(1.0), lambda x: jnp.exp(-(x-25)/kT)])
npts = len(weights_pts)

pos_A = jnp.array(scan_res['posA'])
pos_B = jnp.array(scan_res['posB'])
pos_AB = jnp.concatenate([pos_A, pos_B], axis=1)

E_AB = pot_ex_AB_v(pos_AB, box, pairs_AB, params)
E_A = pot_ex_A_v(pos_A, box, pairs_A, params)
E_B = pot_ex_B_v(pos_B, box, pairs_B, params)
dE = E_AB - E_A - E_B - E_ref
MSE = jnp.sum(dE**2 * weights_pts) / jnp.sum(weights_pts)

return MSE
MSELoss_grad = jit(value_and_grad(MSELoss, argnums=(0)))
代码
文本

In this function, we have added a statistical weight to each data point:

By assigning weights to the data points, we can effectively ignore those with significantly higher total energies, which may not be important for the actual simulations.

3.6 Optimize the energy

Then initialize our optimizer:

代码
文本
[39]
multiTrans = MultiTransform(params)
multiTrans["SlaterExForce/A"] = genOptimizer(learning_rate=0.1)
multiTrans["SlaterExForce/B"] = genOptimizer(learning_rate=0.1)
multiTrans.finalize()
grad_transform = optax.multi_transform(multiTrans.transforms, multiTrans.labels)
opt_state = grad_transform.init(params)
代码
文本

To start our optimization loop, we can iterate over the data for a specified number of epochs and update the parameters using the optimizer. After each epoch, we can save the parameters as a pickle file.

代码
文本
[40]
# the log file
n_epochs = 50

from tqdm import tqdm

with open("log", "w") as logfile:
sids = sids = np.array(list(data.keys()))
for i_epoch in tqdm(range(n_epochs)):
np.random.shuffle(sids)
for sid in sids:
loss, g = MSELoss_grad(params, data[sid])
print(loss, file=logfile)
logfile.flush()
updates, opt_state = grad_transform.update(g, opt_state, params=params)
params = optax.apply_updates(params, updates)
with open('params.pickle', 'wb') as ofile:
pickle.dump(params, ofile)
100%|██████████| 50/50 [03:42<00:00,  4.45s/it]
代码
文本

We will first perform 50 epochs of optimization, but of course, we can continue to optimize further to achieve better results. We can then examine whether the optimized parameters are physically reasonable.

代码
文本
[41]
print(params["SlaterExForce"]["B"])
[30.301695 37.96646  29.424974 29.878721 38.61278 ]
代码
文本

Finally, we can evaluate the fitting performance:

代码
文本
[42]
energies = []
energies_ref = []
for sid in tqdm(data.keys()):
scan_res = data[sid]
E_tot_full = scan_res['tot_full']
kT = 2.494 # 300 K = 2.494 kJ/mol
weights_pts = jnp.piecewise(E_tot_full, [E_tot_full<25, E_tot_full>=25], [lambda x: jnp.array(1.0), lambda x: jnp.exp(-(x-25)/kT)])
pos_A = jnp.array(scan_res['posA'])
pos_B = jnp.array(scan_res['posB'])
pos_AB = jnp.concatenate([pos_A, pos_B], axis=1)
E_AB = pot_ex_AB_v(pos_AB, box, pairs_AB, params)
E_A = pot_ex_A_v(pos_A, box, pairs_A, params)
E_B = pot_ex_B_v(pos_B, box, pairs_B, params)
E_ex = E_AB - E_A - E_B
E_ref = scan_res['ex']
npts = len(E_ref)

for ipt in range(npts):
if weights_pts[ipt] > 1e-2:
energies.append(E_ex[ipt])
energies_ref.append(E_ref[ipt])

energies = np.array(energies)
energies_ref = np.array(energies_ref)
100%|██████████| 50/50 [00:09<00:00,  5.10it/s]
代码
文本
[43]
import matplotlib.pyplot as plt
plt.xlabel("Reference Energy (kJ/mol)")
plt.ylabel("Fitted Energy (kJ/mol)")
plt.axline((0, 0), slope=1, linewidth=1.5, color="k",alpha=0.8)
plt.scatter(energies_ref, energies)
代码
文本

4.Using DMFF for force field optimization of liquid dimethyl carbonate (DMC)

In the following case, we will provide a practical application of DMFF and build the corresponding workflow.

This case is primarily contributed by Wei Feng

代码
文本

4.1 Introduction

The initial step in building a workflow using DMFF is to generate the potential function. The process and API are similar to OpenMM and generally follow the following steps:

  1. The first step in building a force field using DMFF is to define the force field object, known as the Hamiltonian. It encapsulates the typification rules and all the force field parameters. It can be seen as the equivalent of an XML file in OpenMM or an ITP file in GROMACS, but represented as a Python object.
  2. Obtain the molecular bond connectivity information for the system you want to simulate (you can use a PDB file).
  3. Based on the system's connectivity topology, you can use the Hamiltonian to classify and parameterize each atom, bond length, bond angle, dihedral angle, etc., and create a potential function. The function takes inputs such as atomic positions, box size, and force field parameters, making it suitable for subsequent derivative calculations.
  4. If necessary, the potential function can be modified and extended as needed. For example, using jax.grad, one can obtain a function to compute forces and the Virial matrix, which can be directly used in molecular dynamics (MD) simulations.
  5. Estimator and loss functions can be defined based on the potential function to compute desired properties. The optimization of parameters can be performed using a gradient descent optimizer.

In DMFF, we provide JAX implementations of various potential functions commonly used in traditional physics force fields, such as PME, pairwise interactions, multipole moments, and polarizable force fields. With the help of JAX's transformation tools, users can differentiate, vectorize, compile, and modify these potential functions. They can also create new potential functions by re-encapsulating and recombining existing ones. Additionally, users can develop property estimators based on these potential functions, and these new functions can also be further differentiated or vectorized. This function-centric programming paradigm in DMFF offers unbelievable flexibility in defining new force fields and objective functions.

To provide users with a comprehensive understanding of the main modules and functionalities in DMFF and how these modules work together in force field development, we will demonstrate the complete process of model definition, optimization, and deployment using a coarse-grained model of dimethyl carbonate (DMC). The goal of this task is to optimize the interactions between three sites representing the DMC chain (3-site model) and reproduce the radial distribution function (RDF) of liquid DMC obtained from X-ray diffraction (XRD) experiments. We will also compare the DMC coarse-grained model force field with the all-atom OPLS-AA force field.

s

代码
文本

Navigate to the working directory

代码
文本
[44]
import os
current_directory = os.getcwd()
parent_directory = os.path.dirname(current_directory)
os.chdir(parent_directory)
os.chdir(os.path.join("DMC"))
代码
文本

4.2 Definition of potential function

Load the topology of the DMC system and construct the potential function for the simulation box:

代码
文本
[45]
from tqdm import tqdm
from functools import partialmethod
tqdm.__init__ = partialmethod(tqdm.__init__, disable=True)

# define potentials
H = Hamiltonian("prm1.xml")
rc = 0.9
top_pdb = app.PDBFile("box_DMC.pdb")
pot = H.createPotential(top_pdb.topology,
nonbondedMethod=app.PME,
nonbondedCutoff=rc*unit.nanometer,
useDispersionCorrection=False)
efunc = pot.getPotentialFunc()
代码
文本
[46]
params = H.getParameters()
print(params['NonbondedForce']['charge'])
[ 0.2841 -0.5682]
代码
文本

From the prm.xml force field file, we can see that the function includes the following terms for intramolecular atomic interactions: bond, angle, and electrostatics, as well as intermolecular interactions represented by the Lennard-Jones potential:

And we will optimize some parameters.

代码
文本

4.3 Definition of OpenMM sampler

代码
文本
[47]
# OpenMM sampler, for resampling during optimization
def sample_with_prm(parameter, trajectory, init_struct="box_DMC.pdb"):
pdb = app.PDBFile(init_struct)
ff = app.ForceField(parameter)
system = ff.createSystem(pdb.topology,
nonbondedMethod=app.PME,
nonbondedCutoff=rc*unit.nanometer,
constraints=None)
barostat = mm.MonteCarloBarostat(1.0*unit.bar, 293.0*unit.kelvin, 20)
#barostat.setRandomNumberSeed(12345)
system.addForce(barostat)
for force in system.getForces():
if isinstance(force, mm.NonbondedForce):
force.setUseDispersionCorrection(False)
force.setUseSwitchingFunction(False)
integ = mm.LangevinIntegrator(293*unit.kelvin, 5/unit.picosecond, 1*unit.femtosecond)

simulation = app.Simulation(pdb.topology, system, integ)
simulation.reporters.append(app.DCDReporter(trajectory, 4000))
simulation.reporters.append(app.StateDataReporter(sys.stdout, 20000, density=True, step=True, remainingTime=True, speed=True, totalSteps=500*1000))

simulation.context.setPositions(pdb.getPositions())
simulation.minimizeEnergy()
simulation.step(500*1000)
代码
文本

The function generates an NPT ensemble using OpenMM based on the input parameters and saves the generated samples in the specified trajectory file. This function will be repeatedly called for resampling purposes during subsequent optimization steps.

代码
文本

4.4 Initial MD sampling

代码
文本
[48]
# init sampling
sample_with_prm("prm1.xml", "init.dcd")
#"Step","Density (g/mL)","Speed (ns/day)","Time Remaining"
20000,0.6262534623308863,0,--
40000,0.5979051832690799,679,0:58
60000,0.6367384763452368,678,0:56
80000,0.6139587399937261,679,0:53
100000,0.6151387279860191,679,0:50
120000,0.5989298333138645,679,0:48
140000,0.6335292723148818,679,0:45
160000,0.6343532933548159,679,0:43
180000,0.6214166011781163,679,0:40
200000,0.6382065774390001,680,0:38
220000,0.6090503783169839,680,0:35
240000,0.6244315319426381,680,0:33
260000,0.613327148658729,680,0:30
280000,0.6114154616377299,680,0:27
300000,0.6164005785043938,680,0:25
320000,0.6360197331658057,680,0:22
340000,0.6158857376058046,680,0:20
360000,0.633164935263684,680,0:17
380000,0.6254281508591532,680,0:15
400000,0.6268376781117729,680,0:12
420000,0.6313746404948491,680,0:10
440000,0.6215903100117814,680,0:07
460000,0.6165084630785347,680,0:05
480000,0.6246335708795673,681,0:02
500000,0.6114394931719171,681,0:00
代码
文本

4.5 Definition of property calculation functions

In the fitting process, we need to simultaneously fit the Radial Distribution Function (RDF) and density. Therefore, we will define separate calculation functions for RDF and density:

代码
文本
[49]
# define property calculator, in our case, rdf for each frame:
def compute_rdf_frame(traj, xaxis):
rdf_list = []
delta = xaxis[1] - xaxis[0]

coordinates = traj.xyz
masses = np.array([15, 15, 60]) # mass of each site
coordinates_3d = coordinates.reshape((traj.n_frames, 175, 3, 3))
com = np.sum(masses[:, np.newaxis] * coordinates_3d, axis=2) / 90

pairs = np.array(list(combinations(range(175), 2)))

tidx = np.arange(0, 525, 3, dtype=int)
tsub = traj.atom_slice(tidx)
tsub.xyz = com
for frame in tsub:
_, g_r = md.compute_rdf(frame, pairs, r_range=(xaxis[0]-0.5*delta, xaxis[-1]+0.5*delta+1e-10), bin_width=delta)
rdf_list.append(g_r.reshape((1, -1)))
return np.concatenate(rdf_list, axis=0)


def compute_den_frame(traj):
vols = traj.unitcell_volumes # angstrom
return 26.1627907 / vols # convert to g/mL
代码
文本

When calculating the ensemble average of an observable quantity , the following statistical quantities need to be computed: , code above has defined the function.

In general, we need to encapsulate in a function and then take its overall derivative during each optimization iteration. However, in this case, since the RDF and density are purely structural properties that are independent of and do not participate in the derivative calculation, we can precompute the RDF for each sample () and store it. It is not necessary to recalculate it in every iteration.

In the code above, we use the existing tools in mdtraj to compute the RDF for each frame, rather than using a differentiable implementation with jax.。

代码
文本

4.6 Read the data and perform the comparison

Read the experimental data and pre-compute the RDF for the DMC full-atom model in the OPLS-AA force field (generated by LigPargen) as comparison:

代码
文本
[50]
# Prepare reference data
def readRDF(fname):
with open(fname, "r") as f:
data = np.array([[float(j) for j in i.strip().split()] for i in f])
xaxis = np.linspace(2.0, 14.0, 121)
yinterp = np.interp(xaxis, data[:,0], data[:,1])
return xaxis, yinterp

# read experimental benzene RDF
x_ref, y_ref = readRDF("DMC_Experi.txt")
m_ref, n_ref = readRDF("DMC_OPLS.txt")
代码
文本

We will compare the RDF results obtained from the RDF calculation function with the experimental data and the RDF results from the full-atom model in the OPLS-AA force field. Let's start by looking at the RDF generated from the initial parameter sampling, which serves as our starting point for optimization.

代码
文本
[51]
traj_init = md.load("init.dcd", top="box_DMC.pdb")[50:]
rdf_frames_init = compute_rdf_frame(traj_init, x_ref*0.1)
rdf_init = rdf_frames_init.mean(axis=0)

plt.plot(x_ref, rdf_init, label = "Initial")
#plt.plot(x_ref, y_ref, label = "Experiment")
plt.plot(m_ref, n_ref, label = "OPLS-AA")

plt.legend()
plt.show()
代码
文本

Up to now, we have completed the preparation for force field optimization. Before starting the optimization loop, we need to prepare some few tools. Here, we introduce the distinctive feature of DMFF version 0.2.0 - Property Estimator.

代码
文本

The computational cost of computing deep computational graphs that span the entire trajectory in molecular dynamics simulations is significantly high in terms of both time and computation. However, this drawback can be mitigated by a trajectory reweighting approach. In DMFF, the reweighting algorithm is introduced in the MBAR method, which extends the differentiable estimators for ensemble averages and free energies.

s

代码
文本

In the gradient descent training of parameter optimization, the parameters are only perturbed slightly in each training epoch, allowing us to continue using the samples from the previous epoch without significant deviation from the sampling ensemble. This eliminates the need for re-sampling until the optimized ensemble significantly deviates from the sampling ensemble, resulting in a substantial reduction in optimization time and computational cost.

In the reweighted MBAR Estimator, we define two ensembles: the sampling ensemble from which all samples are extracted, and the target ensemble that we want to optimize. The sampling ensemble is updated only when necessary and does not require differentiability. Its data can be generated by external samplers such as OpenMM. The Estimator provides weighting factors, allowing us to estimate the target ensemble by the weighted average of samples from the sampling ensemble.

The workflow can be understood using the following flowchart:

s

代码
文本

The Effective Sample Size is a measure of the degree to which the sampling ensemble deviates from the target ensemble. When the deviation becomes too large, the sampling ensemble is updated.

代码
文本

4.7 Estimator initialization

代码
文本
[52]
# Initialize estimator
estimator = MBAREstimator()
# prepare sampling state and samples
state_name = "prm"
state = OpenMMSampleState(state_name, "prm1.xml", "box_DMC.pdb", temperature=293.0, pressure=1.0)
traj = md.load("init.dcd", top="box_DMC.pdb")[50:]
sample = Sample(traj, state_name)
# add sample and the state to the estimator
estimator.add_state(state)
estimator.add_sample(sample)
estimator.optimize_mbar()
WARNING:root:Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
代码
文本

Here we create MBAREstimator and initialize it. There are two parts of variables that need to be initialized in MBAREstimator:

  • all sampling state information(state)and samples sample) ;
  • The partition function of the sampling state, estimated based on the samples and sampling state information. ( )

It can be observed that in the above code, we define an OpenMMSampleState and add this state and the samples generated from this state to the estimator. Then, the optimize_mbar function is called (which internally calls the external pymbar tool) to estimate the partition function of all the sampling states. It is important to note that these operations do not involve differentiation and do not depend on JAX. After initialization, the estimator can directly provide in a differentiable manner.

代码
文本

4.8 Definition of target ensemble

代码
文本
[53]
# Prepare the target state
cov_map = pot.meta['cov_map']
# Define effective energy function
target_energy_function = buildTrajEnergyFunction(efunc, cov_map, rc, ensemble='npt', pressure=1.0)
target_state = TargetState(293.0, target_energy_function)
代码
文本

Here, we define our target ensemble target_state using the differentiable 3-site model potential function efunc that was defined above. This target_state is used as an input parameter for subsequent reweighting. In the computation process, the estimator will repeatedly call the efunc from the target_state to calculate the effective energy of the samples in the target ensemble. For the NVT ensemble, it is calculated as , and for the NPT ensemble, it is calculated as .

代码
文本

4.9 Definition of loss function

代码
文本
[54]
# Define loss function
rdf_frames = compute_rdf_frame(estimator._full_samples, x_ref*0.1)
den_frames = compute_den_frame(estimator._full_samples)

def neutralize(param):
net_q = jnp.dot(params['NonbondedForce']['charge'], jnp.array([2.0, 1.0]))
param['NonbondedForce']['charge'] = param['NonbondedForce']['charge'] - net_q / 3.0
return param

def lossfunc(param):
# neutralize charge
param = neutralize(param)
weight, utarget = estimator.estimate_weight(target_state, parameters=param)
rdf_pert = (rdf_frames * weight.reshape((-1, 1))).sum(axis=0)
loss_ref = jnp.log(jnp.power(rdf_pert - n_ref, 2).mean())
# loss function of density
den = (den_frames * weight).sum()
loss_den = (den - 1.07)**2 * 10.0
return loss_ref + loss_den, utarget

def lossfunc_den(param):
# neutralize charge
param = neutralize(param)
weight, utarget = estimator.estimate_weight(target_state, parameters=param)
# loss function of density
den = (den_frames * weight).sum()
loss_den = (den - 1.07)**2 * 20.0
return loss_den, utarget
代码
文本

Here, we use the estimator.estimate_weight function to generate the corresponding to the current target_state. We then calculate the average RDF value under the current target_state. Finally, we compare the calculated RDF with the experimental reference values to obtain the RDF object function:

At the same time, we have noticed that when optimizing RDF alone, the optimizer may explore non-physical parameter regions, leading to a decrease in system density and eventually crossing the phase boundary to become gas. To avoid this phenomenon, we also include density error in the loss function to control the system density. Therefore, we separately define the loss function for density:

In the following optimization loop, we will first perform several rounds of optimization for density and then proceed with the RDF.

代码
文本

4.10 Optimizer setup and optimization loop

代码
文本
[55]
# Prepare optimizer
multiTrans = MultiTransform(H.paramtree)
multiTrans["NonbondedForce/sigma"] = genOptimizer(learning_rate=0.005, clip=0.05)
multiTrans["NonbondedForce/epsilon"] = genOptimizer(learning_rate=0.005, clip=0.05)
multiTrans["NonbondedForce/charge"] = genOptimizer(learning_rate=0.001, clip=0.05)
# multiTrans["HarmonicBondForce/k"] = genOptimizer(learning_rate=10.0, clip=10.0)
#multiTrans["HarmonicAngleForce/k"] = genOptimizer(learning_rate=5.0, clip=1.0)
multiTrans.finalize()
grad_transform = optax.multi_transform(multiTrans.transforms, multiTrans.labels)
mask = jax.tree_util.tree_map(lambda x: x.dtype != jnp.int32 and x.dtype != int, params)
grad_transform = optax.masked(grad_transform, mask)
opt_state = grad_transform.init(H.paramtree)
代码
文本
[56]
loss_list = []
traj_init = md.load("init.dcd", top="box_DMC.pdb")[50:]
rdf_frames_init = compute_rdf_frame(traj_init, x_ref*0.1)
rdf_init = rdf_frames_init.mean(axis=0)

# Total number of loops
NL = 60
# Number of density loops
NL_den = 0

for nloop in range(1, NL+1):
print("LOOP", nloop)
if nloop <= NL_den:
(loss, utarget), g = jax.value_and_grad(lossfunc_den, 0, has_aux=True, allow_int=True)(H.paramtree)
else:
(loss, utarget), g = jax.value_and_grad(lossfunc, 0, has_aux=True, allow_int=True)(H.paramtree)
loss_list.append(loss)
print("Loss:", loss)
sys.stdout.flush()
# evaluate effective sample size
ieff = estimator.estimate_effective_sample(utarget, decompose=True)

# update parameter use optax adam optimizer
updates, opt_state = grad_transform.update(g, opt_state, params=H.paramtree)
# deal with integer updates
updates['NonbondedForce']['vsite_types'] = np.array([], dtype=jnp.int32)
newprm = optax.apply_updates(H.paramtree, updates)
H.updateParameters(newprm)

# print out the updated parameter, save it in a xml
H.paramtree = neutralize(H.paramtree)
H.render(f"loop-{nloop}.xml")

# checkout the effective size of each sample in the estimator
print("Effective sample sizes:")
for k, v in ieff.items():
print(f"{k}: {v}")

# remove all state with a sample size smaller than 5
for k, v in ieff.items():
if v < 15 and k != "Total":
estimator.remove_state(k)
# if all the states are removed, add a new state.
if len(estimator.states) < 1:
print("Add", f"loop-{nloop}")
# get new sample using the current state
sample_with_prm(f"loop-{nloop}.xml", f"loop-{nloop}.dcd")
traj = md.load(f"loop-{nloop}.dcd", top="box_DMC.pdb")[50:]
state = OpenMMSampleState(f"loop-{nloop}", f"loop-{nloop}.xml", "box_DMC.pdb", temperature=293.0, pressure=1.0)
sample = Sample(traj, f"loop-{nloop}")
estimator.add_state(state)
estimator.add_sample(sample)
# estimator need to be reconverged whenenver new samples or states are added
estimator.optimize_mbar()
# property of each sample need to be updated as well
rdf_frames = compute_rdf_frame(estimator._full_samples, x_ref*0.1)
den_frames = compute_den_frame(estimator._full_samples)
rdf_frames = compute_rdf_frame(traj, x_ref*0.1)
plt.plot(m_ref, n_ref, label = "OPLS-AA")
# plt.plot(x_ref, y_ref, label = "Experiment")
plt.plot(x_ref, rdf_frames.mean(axis=0), label = "Current")
plt.legend()
plt.title(f"Loop-{nloop}")
# plt.savefig("compare.png")
plt.show()
代码
文本
[57]
sample_with_prm(f"loop-{NL}.xml", f"loop-{NL}.dcd")
traj = md.load(f"loop-{NL}.dcd", top="box_DMC.pdb")[50:]
rdf_final = compute_rdf_frame(traj, x_ref*0.1).mean(axis=0)

plt.plot(x_ref, rdf_init, label = "Initial")
plt.plot(m_ref, n_ref, label = "OPLS-AA")
#plt.plot(x_ref, y_ref, label = "Experiment")
plt.plot(x_ref, rdf_final, label = "Current")
plt.legend()
plt.title(f"Final")
# plt.savefig("compare.png")
plt.show()
代码
文本

5. Summary & Outlook

代码
文本

As an old saying goes in China, to do a good job, one must first sharpen his tools (工欲善其事,必先利其器). In the era of rapid development of differentiable programming techniques driven by the wave of deep learning, we have witnessed a new paradigm in force field development. We aim to transform force field development into an engineering-oriented, automated, and reproducible process, allowing us to reap the benefits of continuous integration/development and the open-source spirit. DMFF, along with other related projects in DeepModeling community, will promote and implement this transformative change.

代码
文本

DMFF is currently in the early stages of rapid iteration and development. There are many areas that require further improvement and numerous possibilities worth exploring:

  • DMFF, along with projects like dflow, aims to implement common force field fitting workflows, such as fitting dihedral angles, free energies, probability distributions, and more. These workflows will be integrated into the software to provide users with a comprehensive suite of tools for force field development and parameterization.

  • DMFF is committed to continuous development and will strive to meet the evolving needs of the scientific community. This includes expanding the range of supported force field function forms to accommodate a wider variety of molecular systems.

  • DMFF recognizes the importance of integrating advanced molecular dynamics algorithms with force field development. By leveraging state-of-the-art sampling techniques and enhanced sampling methods, DMFF aims to enhance the accuracy and efficiency of force field optimization and simulations.

  • Absolutely! DMFF recognizes the importance of user experience and aims to continuously improve its documentation, API, and overall usability. Providing clear and comprehensive documentation is essential for users to understand the functionality and usage of DMFF effectively.

代码
文本

Welcome to write Issues, initiate Discussions, and even submit Pull Requests in the DMFF GitHub project. Specifically:

  • If you are a hardcore developer in the field of molecular force fields and are exploring new forms of force field functions, we welcome you to engage in in-depth discussions with the developers and contribute to enriching the force field calculation capabilities of DMFF.

  • If you are dedicated to simulating a specific system and are struggling to find suitable force field parameters, you can become an angel user of DMFF. You can use DMFF to build a force field optimization workflow based on your own needs and provide valuable suggestions to us based on your practical requirements. Your feedback will contribute to the further development and improvement of DMFF.

代码
文本
DMFF
English
MD
Tutorial
DMFFEnglishMD Tutorial
点个赞吧
本文被以下合集收录
DMFF Notebook合集
Feng Wei
更新于 2024-07-17
13 篇17 人关注
推荐阅读
公开
Getting started with DMFF 1.0.0
DMFF
DMFF
Feng Wei
发布于 2023-11-07
2 赞
公开
最新发布!从零开始学习DMFF 1.0.0副本
DMFF Tutorial中文分子动力学
DMFF Tutorial中文分子动力学
bohr2ee4a1
发布于 2024-06-01