Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Notebook for DPA-2: a large atomic model as a multi-task learner
DeePMD-kit
DPA-2
DeePMD-kitDPA-2
2043899742@qq.com
AIS-Square
更新于 2024-09-14
推荐镜像 :DeePMD-kit:3.0.0b3-cuda12.1
推荐机型 :c8_m32_1 * NVIDIA V100
赞 18
45
52
dpa2-finetune-example-water(v2)
dpa2_data(v6)

Open In Bohrium

代码
文本

©️ Copyright 2023 @ Authors
Authors: Xinzijian Liu📨 Chengqian Zhang📨
Date: 2023-12-20
License: Attribution-NonCommercial-ShareAlike 4.0 International
Quick start: You can click on the blue button at the top of the page Connect , select `registry.dp.tech/dptech/deepmd-kit:3.0.0b3-cuda12.1` image and `c12_m92_1 * NVIDIA V100` machine, and at the same time mount the `dpa2_data(v6)` and `dpa2-finetune-example-water(v2)` dataset, and wait for a few moments to run.

代码
文本

Introduction

代码
文本

Paper link: https://arxiv.org/abs/2312.15492

The codes, datasets and input scripts are all available on zenodo (https://doi.org/10.5281/zenodo.10428497)

代码
文本

This notebook is updated on 9.14 based on beta3 code version.

代码
文本

The rapid development of artificial intelligence (AI) is driving significant changes in the field of atomic modeling, simulation, and design. AI-based potential energy models have been successfully used to perform large-scale and long-time simulations with the accuracy of ab initio electronic structure methods. However, the model generation process still hinders applications at scale. We envision that the next stage would be a model-centric ecosystem, in which a large atomic model (LAM), pre-trained with as many atomic datasets as possible and can be efficiently fine-tuned and distilled to downstream tasks, would serve the new infrastructure of the field of molecular modeling. We propose DPA-2, a novel architecture for a LAM, and develop a comprehensive pipeline for model fine-tuning, distillation, and application, associated with automatic workflows. We show that DPA-2 can accurately represent a diverse range of chemical systems and materials, enabling high-quality simulations and predictions with significantly reduced efforts compared to traditional methods. Our approach paves the way for a universal large atomic model that can be widely applied in molecular and material simulation research, opening new opportunities for scientific discoveries and industrial applications.

代码
文本

In order to run this notebook successfully, let's do some preparatory work first.

代码
文本
[2]
%%bash
cd /root/
cp -r /bohr/qscft-rbns/v6/src/ ./
tree -L 3
.
└── src
    ├── data
    │   ├── FerroEle_train
    │   ├── FerroEle_valid
    │   ├── H2O-PD_train
    │   ├── H2O-PD_valid
    │   ├── SemiCond_train
    │   └── SemiCond_valid
    ├── md
    │   └── water_192
    ├── model
    │   ├── H2O-PD.pt
    │   └── OpenLAM_2.2.0_27heads_beta3.pt
    └── train
        ├── finetune
        ├── multitask
        └── singletask

15 directories, 2 files
代码
文本
  • data: This directory contains three datasets FerroEle, H2O-PD and SemiCond. FerroEle is a small subset of the dataset FerroEle_DPA_v1_0. H2O_H2O-PD is a small subset of the dataset H2O-PD_DPA_v1_0. SemiCond is a small subset of the dataset SemiCond_DPA_v1_0.If you want to get the full dataset, you can go to website AIS Square to download these datasets.
  • model: This directory contains one singletask model H2O-PD.pt and one multitask model OpenLAM_2.2.0_27heads_beta3.pt. The multitask model OpenLAM_2.2.0_27heads_beta3.pt is trained on 27 different datasets.
  • train: This directory contains three modes of training, which are singletask training, multitask training, and finetuning based on pretrained model. In a moment, we're going to demonstrate how to do these trainings in the three folders.
  • md: In this directory, we will demonstrate how to perform molecular dynamics simulation using DPA-2 model.
代码
文本

Model Loading

代码
文本

One can download someone else's trained singletask model or multitask model from web AIS Square

代码
文本
[3]
cd /root/src/data
/root/src/data
/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.
  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
代码
文本

command line interface

代码
文本

The model can be used in many ways. The most straightforward test can be performed using dp test

代码
文本

Firstly we test the valid dataset of H2O_H2O-PD using the singletask model H2O-PD.pt.

代码
文本
[4]
!dp test -m ../model/H2O-PD.pt -n 5 -s H2O-PD_valid
已隐藏输出
代码
文本

where -m gives the model checkpoint to import, -s the path to the tested system and -n the number of tested frames. Several other command line options can be passed to dp test, which can be checked with

代码
文本
[5]
!dp test --help
usage: dp test [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}] [-l LOG_PATH]
               [-m MODEL] [-s SYSTEM | -f DATAFILE] [-S SET_PREFIX]
               [-n NUMB_TEST] [-r RAND_SEED] [--shuffle-test] [-d DETAIL_FILE]
               [-a] [--head HEAD]

options:
  -h, --help            show this help message and exit
  -v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
                        set verbosity level by string or number, 0=ERROR, 1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
  -l LOG_PATH, --log-path LOG_PATH
                        set log file to log messages to disk, if not specified, the logs will only be output to console (default: None)
  -m MODEL, --model MODEL
                        Frozen model file (prefix) to import. TensorFlow backend: suffix is .pb; PyTorch backend: suffix is .pth. (default: frozen_model)
  -s SYSTEM, --system SYSTEM
                        The system dir. Recursively detect systems in this directory (default: .)
  -f DATAFILE, --datafile DATAFILE
                        The path to the datafile, each line of which is a path to one data system. (default: None)
  -S SET_PREFIX, --set-prefix SET_PREFIX
                        [DEPRECATED] Deprecated argument. (default: None)
  -n NUMB_TEST, --numb-test NUMB_TEST
                        The number of data for test. 0 means all data. (default: 0)
  -r RAND_SEED, --rand-seed RAND_SEED
                        The random seed (default: None)
  --shuffle-test        Shuffle test data (default: False)
  -d DETAIL_FILE, --detail-file DETAIL_FILE
                        The prefix to files where details of energy, force and virial accuracy/accuracy per atom will be written (default: None)
  -a, --atomic          Test the accuracy of atomic label, i.e. energy / tensor (dipole, polar) (default: False)
  --head HEAD           (Supported backend: PyTorch) Task head to test if in multi-task mode. (default: None)

examples:
    dp test -m graph.pb -s /path/to/system -n 30
代码
文本

We then use the multitask model OpenLAM_2.2.0_27heads_beta3.pt to test the H2O_H2O-PD valid dataset, and it's worth noting that when we use the multitask model to do the testing we need to specify the task head.

代码
文本
[6]
!dp test -m ../model/OpenLAM_2.2.0_27heads_beta3.pt -n 5 -s H2O-PD_valid --head H2O_H2O-PD
已隐藏输出
代码
文本
  • --head: Task head to test if in multi-task mode.
代码
文本

python interface

代码
文本

One can use python interface of DPA-2 to obtain the energy, force, virial of specific structures.

代码
文本
[7]
import torch
from deepmd.pt.infer.deep_eval import DeepPot
import numpy as np

# This structure is 192-atom water
atype = [7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
coords = np.array([[[ 4.88086987e+00, 5.08849001e+00, 5.81736994e+00],
[ 1.67288995e+00, 2.79171991e+00, 6.14008999e+00],
[ 6.33997011e+00, 5.87844992e+00, 1.45858002e+00],
[ 5.30080986e+00, 7.28452015e+00, 6.98121023e+00],
[ 3.75292993e+00, 5.94418001e+00, 8.13750029e-01],
[ 5.65410995e+00, 6.82400018e-02, 4.64956999e+00],
[ 4.69849014e+00, 1.07639998e-01, 2.39014006e+00],
[ 3.33181000e+00, 3.83896995e+00, 7.35652018e+00],
[ 1.38023996e+00, 2.30410004e+00, 3.73708010e+00],
[ 3.24392009e+00, 3.83446002e+00, 4.62864017e+00],
[ 6.81804991e+00, 1.77417004e+00, 9.16799977e-02],
[ 1.44351006e+00, 4.67081022e+00, 5.27610004e-01],
[ 3.63952994e+00, 6.09609985e+00, 3.87172008e+00],
[ 6.60783005e+00, 3.06886005e+00, 5.08682013e+00],
[ 4.59574986e+00, 2.11696005e+00, 1.23446000e+00],
[ 9.44079995e-01, 6.51146984e+00, 3.75208998e+00],
[ 1.95975995e+00, 2.45908999e+00, 1.20615995e+00],
[ 1.90281999e+00, 4.64120007e+00, 2.83933997e+00],
[ 4.15106010e+00, 4.22786999e+00, 2.35461998e+00],
[ 3.15064001e+00, 1.02985001e+00, 4.58635998e+00],
[ 2.36810994e+00, 3.50479990e-01, 2.19712996e+00],
[ 6.75059986e+00, 3.17266989e+00, 2.38828993e+00],
[ 4.99685001e+00, 2.19505000e+00, 3.72282004e+00],
[ 2.89387989e+00, 8.82499993e-01, 6.97205019e+00],
[ 6.02069998e+00, 5.09185982e+00, 3.67828989e+00],
[ 4.65655994e+00, 2.32630992e+00, 6.16604996e+00],
[ 6.03310013e+00, 3.99452996e+00, 1.01709999e-01],
[ 1.11740005e+00, 6.97051001e+00, 3.97839993e-01],
[ 8.78780007e-01, 6.62689984e-01, 5.63720989e+00],
[ 2.55311990e+00, 5.88978004e+00, 5.95164013e+00],
[ 2.21489996e-01, 5.52580976e+00, 5.79211998e+00],
[-9.17000044e-03, 9.28219974e-01, 2.29167008e+00],
[ 3.98556995e+00, 4.34110022e+00, 5.18548012e+00],
[ 4.21743011e+00, 5.72653008e+00, 6.09771013e+00],
[ 5.81910014e-01, 2.03183007e+00, 7.19089985e+00],
[ 2.31051993e+00, 2.11350989e+00, 6.47226000e+00],
[ 6.95373011e+00, 5.62523985e+00, 2.17477989e+00],
[ 6.72261000e+00, 6.29493999e+00, 5.44109011e+00],
[ 5.46083021e+00, 7.31151009e+00, 5.98238993e+00],
[ 3.88695002e+00, 6.64919972e-01, 6.95054007e+00],
[ 4.55715990e+00, 6.37985992e+00, 4.42229986e-01],
[ 2.90130997e+00, 3.28073001e+00, 6.15589976e-01],
[ 5.54563999e+00, 1.08753002e+00, 4.38471985e+00],
[ 5.16894007e+00, 6.81580019e+00, 4.15231991e+00],
[ 4.61714983e+00, 1.15058994e+00, 1.42268002e+00],
[ 5.75011015e+00, 2.56732011e+00, 3.15859008e+00],
[ 1.59423006e+00, 3.51257992e+00, 6.73897982e+00],
[ 8.17659974e-01, 9.93100032e-02, 6.87493992e+00],
[ 2.15335989e+00, 1.75983000e+00, 4.06019020e+00],
[ 9.04070020e-01, 1.67244995e+00, 3.11746001e+00],
[ 2.53470993e+00, 3.43996000e+00, 5.22603989e+00],
[ 3.78940010e+00, 3.18470001e+00, 4.03200006e+00],
[ 5.76567984e+00, 6.45699978e-01, 7.30638981e+00],
[ 6.27276993e+00, 2.76024008e+00, 7.39845991e+00],
[ 1.63831997e+00, 4.72740984e+00, 1.57593000e+00],
[ 5.38829982e-01, 4.33784008e+00, 5.76900005e-01],
[ 3.06592011e+00, 5.92568016e+00, 4.67959976e+00],
[ 2.18319988e+00, 7.26775980e+00, 3.03326988e+00],
[ 3.42449993e-01, 2.80690002e+00, 4.48589993e+00],
[ 5.42928982e+00, 2.70057988e+00, 5.66814995e+00],
[ 2.26852989e+00, 1.73099005e+00, 1.78567004e+00],
[ 4.41227007e+00, 3.39329004e+00, 1.89870000e+00],
[ 6.81015015e+00, 5.67325020e+00, 3.87030005e+00],
[ 5.66749990e-01, 7.17967987e+00, 3.08417010e+00],
[ 1.17006004e+00, 2.81805992e+00, 1.67999995e+00],
[ 1.70717001e+00, 3.76209998e+00, 3.24534011e+00],
[ 3.25989008e+00, 5.42201996e+00, 3.27817988e+00],
[ 1.48038006e+00, 5.37980986e+00, 3.36378002e+00],
[ 4.97062016e+00, 4.61452007e+00, 2.79702997e+00],
[ 4.06063986e+00, 5.42372990e+00, 1.59935999e+00],
[ 4.12667990e+00, 1.88898003e+00, 5.46392012e+00],
[ 2.46283007e+00, 7.42739975e-01, 5.22220993e+00],
[ 1.69474006e+00, 1.34199997e-02, 1.47354996e+00],
[ 3.89336991e+00, 7.28771019e+00, 2.28997993e+00],
[ 6.21744013e+00, 3.42370009e+00, 1.60818005e+00],
[ 6.68333006e+00, 4.00273991e+00, 2.97725010e+00],
[ 4.50516987e+00, 1.62489998e+00, 3.14264011e+00],
[ 3.33305001e+00, 1.38720006e-01, 4.14518023e+00],
[ 2.60663009e+00, 6.67814016e+00, 6.50474024e+00],
[ 2.77786994e+00, 7.44029999e-01, 4.91820008e-01],
[ 5.37717009e+00, 6.69230986e+00, 2.00269008e+00],
[ 5.59570980e+00, 4.83723021e+00, 4.56727982e+00],
[ 4.73746014e+00, 2.22565007e+00, 2.75880009e-01],
[ 3.92073011e+00, 3.30237007e+00, 6.73226976e+00],
[ 5.15031004e+00, 4.52967978e+00, 6.61049986e+00],
[ 5.92167997e+00, 4.81593990e+00, 6.31749988e-01],
[ 1.49842000e+00, 5.95369005e+00, 3.53689998e-01],
[ 6.97584009e+00, 6.45935011e+00, 8.98949981e-01],
[ 8.66829991e-01, 4.45000008e-02, 4.84080982e+00],
[ 1.99149996e-01, 1.31905997e+00, 5.42710018e+00],
[ 3.50265002e+00, 4.80391979e+00, 2.36179993e-01],
[ 1.58536005e+00, 5.62666988e+00, 5.81293011e+00],
[ 6.92963982e+00, 5.35320997e+00, 6.67840004e+00],
[ 6.93877983e+00, 3.84588003e+00, 5.63868999e+00],
[ 6.09070015e+00, 8.65360022e-01, 2.58961010e+00],
[ 6.96416998e+00, 1.27883005e+00, 1.35783005e+00]]])
cells = np.array([[[ 7.02786398, 0. , 0. ],
[ 0.14013857, 7.30525923, 0. ],
[-0.12944618, 0.04454867, 7.40261316]]])
label_energy = np.array([-490.48730469])

model = DeepPot("../model/H2O-PD.pt")
energy, force, virial = model.eval(coords, cells, atype)
print("Predict energy: %.5f"%(energy[0][0]))
print("Label energy: %.5f"%(label_energy[0]))
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
Predict energy: -490.54547
Label energy: -490.48730
代码
文本

Model Training

代码
文本

We're going to demonstrate how to perform singletask training, multitask training and finetuning based on pretrained model.

代码
文本

singletask

代码
文本
[4]
cd /root/src/train/singletask
/root/src/train/singletask
代码
文本

Here we use a small subset of the dataset H2O-PD_DPA_v1_0 to train a singletask model. The file input.json is the input script.

代码
文本
[9]
cat input.json
已隐藏输出
代码
文本

The parameters in input.json are the same as in the previous version of DeePMD-kit except for the descriptor part. For more details, you can go to our paper.

代码
文本

The training can be invoked by

代码
文本
[11]
!dp --pt train input.json
已隐藏输出
代码
文本

During the training, the error of the model is tested every disp_freq training steps. The training error and validation error are printed correspondingly in the file disp_file (default is lcurve.out). The batch size can be set in the input script by the key batch_size in the corresponding sections for the training and validation data set. An example of the output:

代码
文本
[12]
!head lcurve.out
已隐藏输出
代码
文本

The file contains 8 columns, from left to right, which are the training step, the validation loss, training loss, root mean square (RMS) validation error of energy, RMS training error of energy, RMS validation error of force, RMS training error of force and the learning rate. The RMS error (RMSE) of the energy is normalized by the number of atoms in the system. One can visualize this file with a simple Python script:

代码
文本
[13]
import numpy as np
import matplotlib.pyplot as plt

data = np.genfromtxt("lcurve.out", names=True)
for name in data.dtype.names[1:-1]:
plt.plot(data["step"], data[name], label=name)
plt.legend()
plt.xlabel("Step")
plt.ylabel("Loss")
plt.xscale("symlog")
plt.yscale("log")
plt.grid()
plt.show()
代码
文本

multitask

代码
文本
[5]
cd ../multitask
/root/src/train/multitask
代码
文本

Training on multiple data sets (each data set contains several data systems) can be performed in multi-task mode, with one common descriptor and multiple specific fitting nets for each data set. One needs to switch some parameters in training input script to perform multi-task mode including:

  • model –> model_dict, each key of which can be one individual fitting net.

  • training_data, validation_data –> data_dict, each key of which can be one individual data set contains several data systems for corresponding fitting net, the keys must be consistent with those in model_dict.

  • loss –> loss_dict, each key of which can be one individual loss setting for corresponding fitting net, the keys must be consistent with those in model_dict.

  • model_prob, each key of which can be a non-negative integer or float, deciding the chosen probability for corresponding fitting net in training.

代码
文本

Here we use three different datasets(a small subset of the dataset FerroEle_DPA_v1_0, a small subset of the dataset H2O-PD_DPA_v1_0, and a small subset of the dataset SemiCond_DPA_v1_0) to train a multitask model with three task heads.

代码
文本
[15]
cat input.json
已隐藏输出
代码
文本

The training procedure will automatically choose single-task or multi-task mode, based on the above parameters. The training can be invoked by

代码
文本
[16]
!dp --pt train input.json
已隐藏输出
代码
文本
[17]
!head lcurve.out
已隐藏输出
代码
文本

finetune

代码
文本
[6]
cd ../finetune
/root/src/train/finetune
代码
文本

Pretraining-and-finetuning is a widely used approach in other fields such as Computer Vision (CV) or Natural Language Processing (NLP) to vastly reduce the training cost, while it’s not trivial in potential models. Compositions and configurations of data samples or even computational parameters in upstream software (such as VASP) may be different between the pretrained and target datasets, leading to energy shifts or other diversities of training data.

The multitask training mode can overcome above difficulties. Our DPA-2 model can hopefully learn the common knowledge in the pretrained dataset and thus reduce the computational cost in downstream training tasks.

Here we have a pretrained multitask model multitask_model.pt on a large dataset (eighteen different datasets), a finetuning strategy can be performed by simply running:

代码
文本
[22]
!dp --pt train input.json --finetune ../../model/OpenLAM_2.2.0_27heads_beta3.pt --model-branch H2O_H2O-PD
已隐藏输出
代码
文本

The finetune procedure will inherit the neural network parameters of descriptor in pretrained multitask model. The fitting net can either reinit or inherit the fitting net from any branch of the pre-trained model depending on the argument -m.

  • -m (--model-branch): Model branch chosen for fine-tuning if multi-task. If not specified, it will re-init the fitting net.
代码
文本

Whether singletask mode, multitask mode or finetune mode, the training set contains H2O-PD, so we can compare the validation error on dataset H2O-PD directly using a python script

代码
文本
[23]
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(14,4))
data_singletask = np.genfromtxt("../singletask/lcurve.out", names=True)
data_multitask = np.genfromtxt("../multitask/lcurve.out", names=True)
data_finetune = np.genfromtxt("lcurve.out", names=True)
for idx,ii in enumerate([3,5]):
plt.subplot(1,3,idx+1)
for name in data_singletask.dtype.names[ii:ii+1]:
plt.plot(data_singletask["step"], data_singletask[name], label=f"singletask_{name}")
for name in data_finetune.dtype.names[ii:ii+1]:
plt.plot(data_finetune["step"], data_finetune[name], label=f"finetune_{name}")
for name in data_multitask.dtype.names[ii:ii+1]:
plt.plot(data_multitask["step"], data_multitask[name], label=f"multitask_{name}")
plt.legend()
plt.xlabel("Step")
plt.ylabel("Loss")
#plt.xscale("symlog")
plt.yscale("log")
plt.grid()
plt.show()
代码
文本

dp freeze

代码
文本

The .pth extension file for molecular dynamics simulations can be obtained by dp freeze.

代码
文本
[24]
!dp --pt freeze
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-09-14 11:42:54,054] DEEPMD INFO    DeePMD version: 3.0.0b3
代码
文本

Molecular Dynamics

代码
文本

The model can drive molecular dynamics in LAMMPS.

代码
文本
[7]
cd ../../md/water_192
/root/src/md/water_192
代码
文本
[8]
ls
data.dpa2*  dpa2_in.lammps*  dpa2_model.pth*
代码
文本

Here data.dpa2 gives the initial configuration of water MD simulation, and the file dpa2_in.lammps is the LAMMPS input script. One may check dpa2_in.lammps and finds that it is a rather standard LAMMPS input file for a MD simulation, with only two exception lines:

代码
文本
[9]
'''
# See https://deepmd.rtfd.io/lammps/ for usage
pair_style deepmd dpa2_model.pth
# If atom names (O H in this example) are not set in the pair_coeff command, the type_map defined by the training parameter will be used by default.
pair_coeff * * O H
'''
'\n# See https://deepmd.rtfd.io/lammps/ for usage\npair_style\tdeepmd dpa2_model.pth\n# If atom names (O H in this example) are not set in the pair_coeff command, the type_map defined by the training parameter will be used by default.\npair_coeff  * *\tO H\n'
代码
文本

where the pair style deepmd is invoked and the model file dpa2_model.pth is provided, which means the atomic interaction will be computed by the DPA-2 model that is stored in the file dpa2_model.pth.

代码
文本

In an environment with a compatible version of LAMMPS, the deep potential molecular dynamics can be performed via

代码
文本
[10]
!lmp -i dpa2_in.lammps
已隐藏输出
代码
文本

Distillation

代码
文本

Distillation can significantly improve the efficiency of finetuned models in MD simulations for production. Distillation requires DP-Gen2. First, install the latest version of DP-Gen2.

代码
文本
[8]
!pip install git+https://github.com/zjgemi/dpgen2@deepmd-pytorch
!pip install -U dpdata
已隐藏输出
代码
文本
[9]
cd ../../
/root/src
代码
文本
[10]
mkdir distillation
代码
文本
[11]
cd distillation
/root/src/distillation
代码
文本

This example provides finetuned model and the training and validation data used for finetuning in the dataset. Link them into the working directory.

代码
文本
[17]
!ln -s /bohr/dpa2-finetune-example-water-rtlk/v2/finetuned_model.pt teacher_model.pt
!ln -s /bohr/dpa2-finetune-example-water-rtlk/v2/H2O-PBE0TS-MD/train train
!ln -s /bohr/dpa2-finetune-example-water-rtlk/v2/H2O-PBE0TS-MD/valid valid
代码
文本

Then we prepare the initial data for the DP training, i.e. use the finetuned model to label on some data, e.g. the training data used for finetuning.

代码
文本
[18]
import dpdata
import numpy as np
import os
from deepmd_pt.infer.deep_eval import DeepPot
from pathlib import Path
from tqdm import tqdm
from typing import List, Optional, Tuple

all_type_map = ["H", "He", "Li", "Be", "B", "C", "N", "O", "F", "Ne", "Na", "Mg", "Al", "Si", "P", "S", "Cl", "Ar", "K", "Ca", "Sc", "Ti", "V", "Cr", "Mn", "Fe", "Co", "Ni", "Cu", "Zn", "Ga", "Ge", "As", "Se", "Br", "Kr", "Rb", "Sr", "Y", "Zr", "Nb", "Mo", "Tc", "Ru", "Rh", "Pd", "Ag", "Cd", "In", "Sn", "Sb", "Te", "I", "Xe", "Cs", "Ba", "La", "Ce", "Pr", "Nd", "Pm", "Sm", "Eu", "Gd", "Tb", "Dy", "Ho", "Er", "Tm", "Yb", "Lu", "Hf", "Ta", "W", "Re", "Os", "Ir", "Pt", "Au", "Hg", "Tl", "Pb", "Bi", "Po", "At", "Rn", "Fr", "Ra", "Ac", "Th", "Pa", "U", "Np", "Pu", "Am", "Cm", "Bk", "Cf", "Es", "Fm", "Md", "No", "Lr", "Rf", "Db", "Sg", "Bh", "Hs", "Mt", "Ds", "Rg", "Cn", "Nh", "Fl", "Mc", "Lv", "Ts", "Og"]

class DPPTPredict:
def load_model(self, model: Path):
self.dp = DeepPot(model)

def evaluate(self,
coord: np.ndarray,
cell: Optional[np.ndarray],
atype: List[int]
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
coord = coord.reshape([1, -1, 3])
if cell is not None:
cell = cell.reshape([1, 3, 3])
atype = atype.reshape([1, -1])
e, f, v = self.dp.eval(coord, cell, atype, infer_batch_size=1)
return e.reshape([1])[0], f.reshape([-1, 3]), v.reshape([3, 3])

def predict(self, input_path="input", output_path="output"):
for f in Path(input_path).rglob("type.raw"):
sys = f.parent
print(sys)
d = dpdata.MultiSystems()
mixed_type = len(list(sys.glob("*/real_atom_types.npy"))) > 0
if mixed_type:
d.load_systems_from_file(sys, fmt="deepmd/npy/mixed")
else:
k = dpdata.LabeledSystem(sys, fmt="deepmd/npy")
d.append(k)
for k in d:
for i in tqdm(range(len(k))):
cell = k["cells"][i]
if k.nopbc:
cell = None
coord = k["coords"][i]
ori_atype = k["atom_types"]
anames = k["atom_names"]
atype = np.array([all_type_map.index(anames[j]) for j in ori_atype])
e, f, v = self.evaluate(coord, cell, atype)
k.data["energies"][i] = e
k.data["forces"][i] = f
k.data["virials"][i] = v
# For configurations in DP-Gen2 only accept 1-level dir
out_dir = os.path.join(output_path, str(sys.relative_to(input_path)).replace("/", "_"))
if len(d) == 1:
d[0].to_deepmd_npy_mixed(out_dir)
else:
# The multisystem is loaded from one dir, thus we can safely keep one dir
d.to_deepmd_npy_mixed(out_dir + ".tmp")
fs = os.listdir(out_dir + ".tmp")
assert len(fs) == 1
os.rename(os.path.join(out_dir + ".tmp", fs[0]), out_dir)
os.rmdir(out_dir + ".tmp")

d = DPPTPredict()
d.load_model('teacher_model.pt')
d.predict('train', 'train_predict')
d.predict('valid', 'valid_predict')
train
100%|██████████| 2000/2000 [03:48<00:00,  8.75it/s]
valid
100%|██████████| 2000/2000 [03:45<00:00,  8.88it/s]
代码
文本

Then we prepare the initial configurations for MD exploration. Here we sample 100 configurations randomly from the training data.

代码
文本
[19]
import random

n_select = 100
m = dpdata.MultiSystems()
m.load_systems_from_file("train_predict", fmt="deepmd/npy/mixed")
if m.get_nframes() <= n_select:
os.symlink("train_predict", "init")
else:
ratio = n_select / m.get_nframes()
new = dpdata.MultiSystems()
for s in m:
n = int(len(s)*ratio)
if random.random() < len(s)*ratio - n:
n += 1
if n > 0:
new.append(s.sub_system(random.sample(range(len(s)), n)))
new.to_deepmd_npy_mixed("init")
代码
文本

Below we will prepare the input file for DP-Gen2. You can specify a name for the workflow in the field name. By default, the workflow server https://workflows.deepmodeling.com is used. In bohrium_config, fill in your Bohrium username, password, and project ID. The type_map in the inputs field determines the type map for the final distilled model, and mass_map for the corresponding masses. Specify init_data_sys with the list of system paths for the initial training data we just prepared. valid_data_sys is optional which can be the system paths for the validation data. The training and exploration sections each require input file templates for DP and LAMMPS, which will be provided later. In explore, configurations should be passed with the initial configuration files we just prepared. stages specifies the settings for MD simulations, n_sample determines how many configurations to sample from the initial configurations per iteration, and revisions specifies the values of the variables in the LAMMPS input file template. Each variable's value can be a list, and the final combinations are the Cartesian product of all lists. For more usage of parameters, please refer to the documentation at https://docs.deepmodeling.com/projects/dpgen2/en/latest/.

代码
文本
[24]
%%file input.json
{
"name": "water-distill",
"bohrium_config": {
"username": "<your-bohrium-username>",
"password": "<your-bohrium-password>",
"project_id": "<your-bohrium-project-id>",
"_comment": "all"
},
"default_step_config": {
"template_config": {
"image": "registry.dp.tech/dptech/prod-11881/dpgen2-utils:1.2",
"_comment": "all"
},
"_comment": "all"
},
"step_configs": {
"run_train_config": {
"template_config": {
"image": "registry.dp.tech/dptech/deepmd-kit:2.2.7-cuda11.6",
"_comment": "all"
},
"executor": {
"type": "dispatcher",
"retry_on_submission_error": 10,
"image_pull_policy": "IfNotPresent",
"machine_dict": {
"batch_type": "Bohrium",
"context_type": "Bohrium",
"remote_profile": {
"input_data": {
"job_type": "container",
"platform": "ali",
"scass_type": "1 * NVIDIA V100_16g"
}
}
}
},
"_comment": "all"
},
"run_explore_config": {
"template_config": {
"image": "registry.dp.tech/dptech/deepmd-kit:2.2.7-cuda11.6",
"_comment": "all"
},
"continue_on_success_ratio": 0.80,
"executor": {
"type": "dispatcher",
"retry_on_submission_error": 10,
"image_pull_policy": "IfNotPresent",
"machine_dict": {
"batch_type": "Bohrium",
"context_type": "Bohrium",
"remote_profile": {
"input_data": {
"job_type": "container",
"platform": "ali",
"scass_type": "1 * NVIDIA V100_16g"
}
}
}
},
"template_slice_config": {
"group_size": 5,
"pool_size": 1
},
"_comment": "all"
},
"run_fp_config": {
"template_config": {
"image": "registry.dp.tech/dplc/deepmd-pytorch:d74fa",
"_comment": "all"
},
"continue_on_success_ratio": 0.80,
"executor": {
"type": "dispatcher",
"retry_on_submission_error": 10,
"image_pull_policy": "IfNotPresent",
"machine_dict": {
"batch_type": "Bohrium",
"context_type": "Bohrium",
"remote_profile": {
"input_data": {
"job_type": "container",
"platform": "ali",
"scass_type": "1 * NVIDIA V100_16g"
}
}
}
},
"template_slice_config": {
"group_size": 500,
"pool_size": 1
},
"_comment": "all"
},
"_comment": "all"
},
"upload_python_packages": [
"/opt/mamba/lib/python3.10/site-packages/dpgen2",
"/opt/mamba/lib/python3.10/site-packages/dpdata"
],
"inputs": {
"type_map": [
"O",
"H"
],
"mixed_type": true,
"mass_map": [
16.0,
4.0
],
"init_data_prefix": null,
"init_data_sys": [
"train_predict"
],
"valid_data_sys": [
"valid_predict"
],
"_comment": "all"
},
"train": {
"type": "dp",
"numb_models": 4,
"config": {
"init_model_policy": "yes",
"init_model_old_ratio": 0.90,
"init_model_numb_steps": 500000,
"init_model_start_lr": 1e-4,
"init_model_start_pref_e": 0.25,
"init_model_start_pref_f": 100,
"_comment": "all"
},
"template_script": "train.json",
"_comment": "all"
},
"explore": {
"type": "lmp",
"config": {
"command": "lmp -var restart 0"
},
"convergence": {
"type": "adaptive-lower",
"conv_tolerance": 0.005,
"_numb_candi_f": 3000,
"rate_candi_f": 0.15,
"level_f_hi": 0.5,
"n_checked_steps": 8,
"_command": "all"
},
"max_numb_iter": 16,
"fatal_at_max": false,
"configuration_prefix": null,
"configurations": [
{
"type": "file",
"files": [
"init"
],
"fmt": "deepmd/npy/mixed"
}
],
"stages": [
[
{
"type": "lmp-template",
"lmp": "template.lammps",
"trj_freq": 100,
"revisions": {
"V_NSTEPS": [
10000
],
"V_TEMP": [
330
],
"V_DUMPFREQ": [
200
]
},
"sys_idx": [
0
],
"n_sample": 100
}
]
],
"_comment": "all"
},
"fp": {
"type": "deepmd_pt",
"task_max": 4000,
"run_config" : {
"teacher_model_path": "teacher_model.pt"
},
"inputs_config": {},
"_comment": "all"
}
}
Overwriting input.json
代码
文本

Here is a simple LAMMPS input template for NVT simulations, where the number of steps, temperature, and output frequency are provided as variables.

代码
文本
[21]
%%file template.lammps
variable NSTEPS equal V_NSTEPS
variable TEMP equal V_TEMP
variable THERMO_FREQ equal V_DUMPFREQ
variable TAU_T equal 0.100000

# Initialization
units metal
dimension 3
atom_style atomic
boundary p p p

read_data conf.lmp
mass 1 16.0
mass 2 4.0

# Interatomic potentials - DeepMD
pair_style deepmd
pair_coeff * *

timestep 0.001 # ps
velocity all create ${TEMP} 1815191 mom yes rot yes dist gaussian

run_style verlet
fix 1 all nvt temp ${TEMP} ${TEMP} ${TAU_T}
thermo_style custom step temp pe etotal press
thermo ${THERMO_FREQ} # Ouput thermodynamic properties
dump dpgen_dump
run ${NSTEPS}
Writing template.lammps
代码
文本

This is a DP training input template for distilled model (DPA-1 without attention layer)

代码
文本
双击即可修改
代码
文本
[22]
%%file train.json
{
"model": {
"type_map": [
"O",
"H"
],
"descriptor": {
"type": "se_atten_v2",
"sel": 120,
"rcut_smth": 0.5,
"rcut": 6.0,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"attn": 128,
"attn_layer": 0,
"attn_dotr": true,
"attn_mask": false,
"_comment": " that's all"
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},
"learning_rate": {
"type": "exp",
"decay_steps": 5000,
"start_lr": 0.001,
"stop_lr": 3.51e-08,
"_comment": "that's all"
},
"loss": {
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": " that's all"
},
"training": {
"training_data": {
"systems": [],
"batch_size": "auto",
"_comment": "that's all"
},
"validation_data": {
"systems": [],
"batch_size": 1,
"numb_btch": 3,
"_comment": "that's all"
},
"numb_steps": 1000000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 1000,
"_comment": "that's all"
},
"_comment": "that's all"
}
Writing train.json
代码
文本

Finally, submit the distillation workflow

代码
文本
[25]
!dpgen2 submit input.json
Workflow has been submitted (ID: water-distill-58xlv, UID: c406ecf3-b7a3-4243-b45b-7e00ab1aaff2)
Workflow link: https://workflows.deepmodeling.com/workflows/argo/water-distill-58xlv
代码
文本

The progress of the workflow can be tracked through the link printed above. The metrics for each iteration of distillation can be obtained through the dpgen2 command line

代码
文本
[28]
!dpgen2 status input.json water-distill-58xlv
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 10.80it/s]
No finished iteration found

代码
文本

DP-Gen based on a DPA-2 pretrained model

代码
文本

Finetuning based on DPA-2 pretrained model can reduce the amount of data required for training. Running DP-Gen with a DPA-2 pretrained model can also save first-principles labelling. DP-Gen with DPA-2 requires DP-Gen2. First, install the latest version of DP-Gen2.

代码
文本
[32]
cd ..
/root/src
/opt/mamba/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.
  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
代码
文本
[34]
mkdir dpgen
代码
文本
[37]
cd dpgen
/root/src/dpgen
代码
文本

This example provides pretrained model, the initial training data of water and PBE functional files required for VASP calculations in the dataset. Link them into the working directory.

代码
文本
[54]
!ln -s /bohr/dpa2-finetune-example-water-rtlk/v2/pretrained_model.pt pretrained_model.pt
!ln -s /bohr/dpa2-finetune-example-water-rtlk/v2/H2O-PBE0TS-MD/train train
!ln -s /bohr/dpa2-finetune-example-water-rtlk/v2/PBE PBE
代码
文本

First, we need initial training data. Then we prepare the initial configurations for MD exploration. Here we sample 100 configurations randomly from the training data.

代码
文本
[55]
import random
import dpdata
from pathlib import Path

n_select = 100
m = dpdata.MultiSystems()

for f in Path("train").rglob("type.raw"):
sys = f.parent
mixed_type = len(list(sys.glob("*/real_atom_types.npy"))) > 0
if mixed_type:
m.load_systems_from_file(sys, fmt="deepmd/npy/mixed")
else:
s = dpdata.LabeledSystem(sys, fmt="deepmd/npy")
m.append(s)

if m.get_nframes() <= n_select:
m.to_deepmd_npy_mixed("init")
else:
ratio = n_select / m.get_nframes()
new = dpdata.MultiSystems()
for s in m:
n = int(len(s)*ratio)
if random.random() < len(s)*ratio - n:
n += 1
if n > 0:
new.append(s.sub_system(random.sample(range(len(s)), n)))
new.to_deepmd_npy_mixed("init")
代码
文本

Below we will prepare the input file for DP-Gen2. You can specify a name for the workflow in the field name. By default, the workflow server https://workflows.deepmodeling.com is used. In bohrium_config, fill in your Bohrium username, password, and project ID. Specify init_data_sys with the list of system paths for the initial training data. The training, exploration and first-principle sections each require input file templates for DP, LAMMPS and VASP, which will be provided later. In train, paths of the pretrained models are required for init_models_paths. Here we provide 4 identical paths. In explore, configurations should be passed with the initial configuration files we just prepared. stages specifies the settings for MD simulations, n_sample determines how many configurations to sample from the initial configurations per iteration, and revisions specifies the values of the variables in the LAMMPS input file template. Each variable's value can be a list, and the final combinations are the Cartesian product of all lists. For more usage of parameters, please refer to the documentation at https://docs.deepmodeling.com/projects/dpgen2/en/latest/.

代码
文本
[58]
%%file input.json
{
"name": "water-dpgen",
"bohrium_config": {
"username": "<your-bohrium-username>",
"password": "<your-bohrium-password>",
"project_id": "<your-bohrium-project-id>",
"_comment": "all"
},
"default_step_config": {
"template_config": {
"image": "registry.dp.tech/dptech/prod-11881/dpgen2-utils:1.2",
"_comment": "all"
},
"_comment": "all"
},
"step_configs": {
"run_train_config": {
"template_config": {
"image": "registry.dp.tech/dplc/deepmd-pytorch:d74fa",
"_comment": "all"
},
"executor": {
"type": "dispatcher",
"retry_on_submission_error": 10,
"image_pull_policy": "IfNotPresent",
"machine_dict": {
"batch_type": "Bohrium",
"context_type": "Bohrium",
"remote_profile": {
"input_data": {
"job_type": "container",
"platform": "paratera",
"scass_type": "c10_m38_1 * NVIDIA V100"
}
}
}
},
"_comment": "all"
},
"run_explore_config": {
"template_config": {
"image": "registry.dp.tech/dptech/prod-11106/deepmd-pytorch-lammps:d74fa",
"_comment": "all"
},
"continue_on_success_ratio": 0.8,
"executor": {
"type": "dispatcher",
"retry_on_submission_error": 10,
"image_pull_policy": "IfNotPresent",
"machine_dict": {
"batch_type": "Bohrium",
"context_type": "Bohrium",
"remote_profile": {
"input_data": {
"job_type": "container",
"platform": "paratera",
"scass_type": "c10_m38_1 * NVIDIA V100"
}
}
}
},
"template_slice_config": {
"group_size": 5,
"pool_size": 1
},
"_comment": "all"
},
"run_fp_config": {
"template_config": {
"image": "registry.dp.tech/dptech/vasp:5.4.4",
"_comment": "all"
},
"continue_on_success_ratio": 0.8,
"executor": {
"type": "dispatcher",
"retry_on_submission_error": 10,
"image_pull_policy": "IfNotPresent",
"machine_dict": {
"batch_type": "Bohrium",
"context_type": "Bohrium",
"remote_profile": {
"input_data": {
"job_type": "container",
"platform": "ali",
"scass_type": "c8_m32_cpu"
}
}
}
},
"template_slice_config": {
"group_size": 20,
"pool_size": 1
},
"_comment": "all"
},
"_comment": "all"
},
"upload_python_packages": [
"/opt/mamba/lib/python3.10/site-packages/dpgen2",
"/opt/mamba/lib/python3.10/site-packages/dpdata"
],
"inputs": {
"type_map": [
"H",
"He",
"Li",
"Be",
"B",
"C",
"N",
"O",
"F",
"Ne",
"Na",
"Mg",
"Al",
"Si",
"P",
"S",
"Cl",
"Ar",
"K",
"Ca",
"Sc",
"Ti",
"V",
"Cr",
"Mn",
"Fe",
"Co",
"Ni",
"Cu",
"Zn",
"Ga",
"Ge",
"As",
"Se",
"Br",
"Kr",
"Rb",
"Sr",
"Y",
"Zr",
"Nb",
"Mo",
"Tc",
"Ru",
"Rh",
"Pd",
"Ag",
"Cd",
"In",
"Sn",
"Sb",
"Te",
"I",
"Xe",
"Cs",
"Ba",
"La",
"Ce",
"Pr",
"Nd",
"Pm",
"Sm",
"Eu",
"Gd",
"Tb",
"Dy",
"Ho",
"Er",
"Tm",
"Yb",
"Lu",
"Hf",
"Ta",
"W",
"Re",
"Os",
"Ir",
"Pt",
"Au",
"Hg",
"Tl",
"Pb",
"Bi",
"Po",
"At",
"Rn",
"Fr",
"Ra",
"Ac",
"Th",
"Pa",
"U",
"Np",
"Pu",
"Am",
"Cm",
"Bk",
"Cf",
"Es",
"Fm",
"Md",
"No",
"Lr",
"Rf",
"Db",
"Sg",
"Bh",
"Hs",
"Mt",
"Ds",
"Rg",
"Cn",
"Nh",
"Fl",
"Mc",
"Lv",
"Ts",
"Og"
],
"mixed_type": true,
"do_finetune": true,
"mass_map": [
4.0,
4.0026,
6.94,
9.0122,
10.81,
12.011,
14.007,
15.999,
18.998,
20.18,
22.99,
24.305,
26.982,
28.0855,
30.974,
32.06,
35.45,
39.95,
39.098,
40.078,
44.956,
47.867,
50.942,
51.996,
54.938,
55.845,
58.933,
58.693,
63.546,
65.38,
69.723,
72.63,
74.922,
78.971,
79.904,
83.798,
85.468,
87.62,
88.906,
91.224,
92.906,
95.95,
97,
101.07,
102.91,
106.42,
107.87,
112.41,
114.82,
118.71,
121.76,
127.6,
126.9,
131.29,
132.91,
137.33,
138.91,
140.12,
140.91,
144.24,
145,
150.36,
151.96,
157.25,
158.93,
162.5,
164.93,
167.26,
168.93,
173.05,
174.97,
178.49,
180.95,
183.84,
186.21,
190.23,
192.22,
195.08,
196.97,
200.59,
204.38,
207.2,
208.98,
209,
210,
222,
223,
226,
227,
232.04,
231.04,
238.03,
237,
244,
243,
247,
247,
251,
252,
257,
258,
259,
262,
267,
268,
269,
270,
269,
277,
281,
282,
285,
286,
290,
290,
293,
294,
294
],
"init_data_prefix": null,
"init_data_sys": [
"train"
],
"_comment": "all"
},
"train": {
"type": "dp",
"numb_models": 4,
"init_models_paths": [
"pretrained_model.pt",
"pretrained_model.pt",
"pretrained_model.pt",
"pretrained_model.pt"
],
"config": {
"impl": "pytorch",
"finetune_args": "-m H2O_H2O-PD",
"init_model_policy": "yes",
"init_model_old_ratio": 0.9,
"init_model_numb_steps": 100000,
"init_model_start_lr": 2e-05,
"init_model_start_pref_e": 0.25,
"init_model_start_pref_f": 100,
"_comment": "all"
},
"template_script": "train.json",
"_comment": "all"
},
"explore": {
"type": "lmp",
"config": {
"command": "lmp -var restart 0",
"impl": "pytorch"
},
"convergence": {
"type": "adaptive-lower",
"conv_tolerance": 0.005,
"_numb_candi_f": 3000,
"rate_candi_f": 0.15,
"level_f_hi": 0.5,
"n_checked_steps": 8,
"_command": "all"
},
"max_numb_iter": 16,
"fatal_at_max": false,
"configuration_prefix": null,
"configurations": [
{
"type": "file",
"files": [
"init"
],
"fmt": "deepmd/npy/mixed"
}
],
"stages": [
[
{
"type": "lmp-template",
"lmp": "template.lammps",
"trj_freq": 200,
"revisions": {
"V_NSTEPS": [
5000
],
"V_TEMP": [
330
],
"V_DUMPFREQ": [
200
]
},
"sys_idx": [
0
],
"n_sample": 10
}
]
],
"_comment": "all"
},
"fp": {
"type": "vasp",
"task_max": 300,
"inputs_config": {
"pp_files": {
"O": "PBE/O/POTCAR",
"H": "PBE/H/POTCAR"
},
"incar": "INCAR",
"kspacing": 0.32,
"kgamma": true
},
"run_config": {
"command": "source /opt/intel/oneapi/setvars.sh && mpirun -n 16 vasp_std"
},
"_comment": "all"
}
}
Writing input.json
代码
文本

Here is a simple LAMMPS input template for NVT simulations, where the number of steps, temperature, and output frequency are provided as variables.

代码
文本
[59]
%%file template.lammps
variable NSTEPS equal V_NSTEPS
variable TEMP equal V_TEMP
variable THERMO_FREQ equal V_DUMPFREQ
variable TAU_T equal 0.100000

# Initialization
units metal
dimension 3
atom_style atomic
boundary p p p

read_data conf.lmp
mass 1 4.000
mass 2 4.003
mass 3 6.940
mass 4 9.012
mass 5 10.810
mass 6 12.011
mass 7 14.007
mass 8 15.999
mass 9 18.998
mass 10 20.180
mass 11 22.990
mass 12 24.305
mass 13 26.982
mass 14 28.085
mass 15 30.974
mass 16 32.060
mass 17 35.450
mass 18 39.950
mass 19 39.098
mass 20 40.078
mass 21 44.956
mass 22 47.867
mass 23 50.942
mass 24 51.996
mass 25 54.938
mass 26 55.845
mass 27 58.933
mass 28 58.693
mass 29 63.546
mass 30 65.380
mass 31 69.723
mass 32 72.630
mass 33 74.922
mass 34 78.971
mass 35 79.904
mass 36 83.798
mass 37 85.468
mass 38 87.620
mass 39 88.906
mass 40 91.224
mass 41 92.906
mass 42 95.950
mass 43 97.000
mass 44 101.070
mass 45 102.910
mass 46 106.420
mass 47 107.870
mass 48 112.410
mass 49 114.820
mass 50 118.710
mass 51 121.760
mass 52 127.600
mass 53 126.900
mass 54 131.290
mass 55 132.910
mass 56 137.330
mass 57 138.910
mass 58 140.120
mass 59 140.910
mass 60 144.240
mass 61 145.000
mass 62 150.360
mass 63 151.960
mass 64 157.250
mass 65 158.930
mass 66 162.500
mass 67 164.930
mass 68 167.260
mass 69 168.930
mass 70 173.050
mass 71 174.970
mass 72 178.490
mass 73 180.950
mass 74 183.840
mass 75 186.210
mass 76 190.230
mass 77 192.220
mass 78 195.080
mass 79 196.970
mass 80 200.590
mass 81 204.380
mass 82 207.200
mass 83 208.980
mass 84 209.000
mass 85 210.000
mass 86 222.000
mass 87 223.000
mass 88 226.000
mass 89 227.000
mass 90 232.040
mass 91 231.040
mass 92 238.030
mass 93 237.000
mass 94 244.000
mass 95 243.000
mass 96 247.000
mass 97 247.000
mass 98 251.000
mass 99 252.000
mass 100 257.000
mass 101 258.000
mass 102 259.000
mass 103 262.000
mass 104 267.000
mass 105 268.000
mass 106 269.000
mass 107 270.000
mass 108 269.000
mass 109 277.000
mass 110 281.000
mass 111 282.000
mass 112 285.000
mass 113 286.000
mass 114 290.000
mass 115 290.000
mass 116 293.000
mass 117 294.000
mass 118 294.000

# Interatomic potentials - DeepMD
pair_style deepmd
pair_coeff * *

timestep 0.001 # ps
velocity all create ${TEMP} 1815191 mom yes rot yes dist gaussian

run_style verlet
fix 1 all nvt temp ${TEMP} ${TEMP} ${TAU_T}
thermo_style custom step temp pe etotal press
thermo ${THERMO_FREQ} # Ouput thermodynamic properties
dump dpgen_dump
run ${NSTEPS}
Writing template.lammps
代码
文本

This is a DP training input template for DPA-2.

代码
文本
[60]
%%file train.json
{
"model": {
"type_embedding": {
"neuron": [
8
],
"tebd_input_mode": "concat"
},
"type_map": [
"H",
"He",
"Li",
"Be",
"B",
"C",
"N",
"O",
"F",
"Ne",
"Na",
"Mg",
"Al",
"Si",
"P",
"S",
"Cl",
"Ar",
"K",
"Ca",
"Sc",
"Ti",
"V",
"Cr",
"Mn",
"Fe",
"Co",
"Ni",
"Cu",
"Zn",
"Ga",
"Ge",
"As",
"Se",
"Br",
"Kr",
"Rb",
"Sr",
"Y",
"Zr",
"Nb",
"Mo",
"Tc",
"Ru",
"Rh",
"Pd",
"Ag",
"Cd",
"In",
"Sn",
"Sb",
"Te",
"I",
"Xe",
"Cs",
"Ba",
"La",
"Ce",
"Pr",
"Nd",
"Pm",
"Sm",
"Eu",
"Gd",
"Tb",
"Dy",
"Ho",
"Er",
"Tm",
"Yb",
"Lu",
"Hf",
"Ta",
"W",
"Re",
"Os",
"Ir",
"Pt",
"Au",
"Hg",
"Tl",
"Pb",
"Bi",
"Po",
"At",
"Rn",
"Fr",
"Ra",
"Ac",
"Th",
"Pa",
"U",
"Np",
"Pu",
"Am",
"Cm",
"Bk",
"Cf",
"Es",
"Fm",
"Md",
"No",
"Lr",
"Rf",
"Db",
"Sg",
"Bh",
"Hs",
"Mt",
"Ds",
"Rg",
"Cn",
"Nh",
"Fl",
"Mc",
"Lv",
"Ts",
"Og"
],
"descriptor": {
"type": "hybrid",
"hybrid_mode": "sequential",
"list": [
{
"type": "se_atten",
"sel": 120,
"rcut_smth": 2.0,
"rcut": 9.0,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 12,
"seed": 1,
"attn": 128,
"attn_layer": 0,
"attn_dotr": true,
"attn_mask": false,
"post_ln": true,
"ffn": false,
"ffn_embed_dim": 1024,
"activation": "tanh",
"scaling_factor": 1.0,
"head_num": 1,
"normalize": true,
"temperature": 1.0,
"add": "concat",
"pre_add": true,
"_comment": " that's all"
},
{
"type": "se_uni",
"sel": 40,
"rcut_smth": 0.5,
"rcut": 4.0,
"nlayers": 12,
"g1_dim": 128,
"g2_dim": 32,
"attn2_hidden": 32,
"attn2_nhead": 4,
"attn1_hidden": 128,
"attn1_nhead": 4,
"axis_dim": 4,
"update_h2": false,
"update_g1_has_conv": true,
"update_g1_has_grrg": true,
"update_g1_has_drrd": true,
"update_g1_has_attn": true,
"update_g2_has_g1g1": true,
"update_g2_has_attn": true,
"attn2_has_gate": true,
"add_type_ebd_to_seq": false,
"smooth": true,
"_comment": " that's all"
}
]
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},
"learning_rate": {
"type": "exp",
"decay_steps": 5000,
"start_lr": 0.0002,
"stop_lr": 3.51e-08,
"_comment": "that's all"
},
"loss": {
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": " that's all"
},
"training": {
"training_data": {
"batch_size": 1,
"_comment": "that's all"
},
"validation_data": {
"batch_size": 1,
"_comment": "that's all"
},
"numb_steps": 100000,
"warmup_steps": 0,
"gradient_max_norm": 5.0,
"seed": 1,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 2000,
"_comment": "that's all"
}
}
Writing train.json
代码
文本

Here is a VASP input template

代码
文本
[61]
%%file INCAR
PREC=A
ENCUT=600
ISYM=0
ALGO=fast
EDIFF=1.000000e-06
LREAL=A
NPAR=1
KPAR=1
NELMIN=4
ISIF=2
ISMEAR=1
SIGMA=1.000000
IBRION=-1
NSW=0
LWAVE=F
LCHARG=F
PSTRESS=0
KSPACING=0.160000
KGAMMA=.FALSE.
Writing INCAR
代码
文本

Finally, submit the DP-Gen workflow

代码
文本
[62]
!dpgen2 submit input.json
Workflow has been submitted (ID: water-dpgen-596v5, UID: 9b6f5fdc-d78a-4056-a626-2c2e22628f7e)
Workflow link: https://workflows.deepmodeling.com/workflows/argo/water-dpgen-596v5
代码
文本

The progress of the workflow can be tracked through the link printed above. The metrics for each iteration of DP-Gen can be obtained through the dpgen2 command line

代码
文本
[65]
!dpgen2 status input.json water-dpgen-596v5
WARNING:root:Exploration scheduler not found in the global outputs
WARNING:root:no scheduler is finished
代码
文本

Tips

  1. Users are welcome to explore the DP Combo web server , which helps users automate operations such as model training and model distillation. Related notebook: DP Combo教程, 借助DP Combo一键丝滑生成半导体势函数 and 固态电解质实战 | DP Combo@APP体验

  2. Current DPA-2 model does not yet support features such as zbl, which we will implement in the near future. If you want to use these features, you can use the previous version of DeePMD-kit(github). Related notebook: DeePMD 使用教程、科研案例、问题收集合集

代码
文本
DeePMD-kit
DPA-2
DeePMD-kitDPA-2
已赞18
本文被以下合集收录
机器学习与DFT精华帖
gtang
更新于 2024-09-13
38 篇22 人关注
good notebooks collected by Taiping Hu
TaipingHu
更新于 2024-09-10
33 篇14 人关注
推荐阅读
公开
Hands-on to APEX (v1.2) on Bohrium
APEXWorkflowMaterialEnglishsimulation
APEXWorkflowMaterialEnglishsimulation
zhuoyli@connect.hku.hk
更新于 2024-08-08
4 赞5 转存文件
公开
asdfasdf
adf
adf
bulindog
发布于 2023-09-20
评论
 <a href="https://nb....

dfzshiwo@163.com

12-24 20:54
连接错了

2043899742@qq.com

作者
12-25 00:48
已改,谢谢老师
评论
 - data: This directo...

jianzhifu@vip.163.com

01-09 02:35
H2O_H2O-PD在哪个文件夹里?

2043899742@qq.com

作者
01-12 03:51
回复 jianzhifu@vip.163.com 是在src/data/H2O-PD_train和src/data/H2O-PD_valid文件夹下
评论
 We then use the mult...

cjxxjc729

02-03 22:49
what is the  task head?

Samoyezii

09-05 04:39
Available ones are ['Domains_Alloy', 'Domains_Anode', 'Domains_Cluster', 'Domains_Drug', 'Domains_FerroEle', 'Domains_OC2M', 'Domains_SSE-PBE', 'Domains_SemiCond', 'H2O_H2O-PD', 'Metals_AgAu-PBE', 'Metals_AlMgCu', 'Metals_Cu', 'Metals_Sn', 'Metals_Ti', 'Metals_V', 'Metals_W', 'Others_C12H26', 'Others_HfO2'].
展开
评论