Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Quick Start DeePMD-kit | Train Methane Deep Potential Molecular Dynamics Model
English
DeePMD-kit
EnglishDeePMD-kit
Letian
更新于 2024-07-10
推荐镜像 :deepmd-kit:2.2.1-cunda11.6
推荐机型 :c2_m4_cpu
赞 1
2
2
DeePMD-kit Quick-Start(v1)

Quick Start DeePMD-kit | Train Methane Deep Potential Molecular Dynamics Model

代码
文本

Open In Bohrium

代码
文本

©️ Copyright 2024 @ Authors
Date: 2024-07-05
Sharing Agreement: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick Start: Click the Connect button above, select the deepmd-kit:2.2.1-cuda11.6-notebook image and c8_m16_cpu node configuration, and wait for a moment to run.

代码
文本

Deep Potential is a collision between machine learning and physical principles, which presents a new computational paradigm as shown in the figure below.

Fig2

Figure | A new computational paradigm consisting of molecular modeling, machine learning, and high-performance computing (HPC).

If you need a deeper understanding of the Deep Potential, you can click 👉 From DFT to MD: A Comprehensive 「Deep Potential」 Guide to Getting Started with Materials Computation

代码
文本
代码
文本

Objective

Master the paradigm cycle of using DeePMD-kit to build deep potential molecular dynamics models and learn how to apply them to molecular dynamics tasks with complete examples.

After learning this tutorial, you will be able to:

  • Understand the data formats and run scripts required for training DeePMD-kit
  • Train, freeze, compress, and test DeePMD-kit models
  • Call DeePMD-kit for calculations in the molecular dynamics software LAMMPS.

For more information on DeePMD learning materials, please refer to the following two courses:

Reading this tutorial will take 【at most】about 20 minutes, so let's get started!

代码
文本

Background

In this tutorial, we will use gaseous methane molecules as an example to provide a detailed introduction to the training and application of Deep Potential (DP) models.

DeePMD-kit is a software based on fitting first-principles data with neural networks to obtain potential energy models for molecular dynamics simulations. Without manual intervention, it can convert the data provided by users into deep potential energy models in several hours end-to-end. This model can seamlessly integrate with common molecular dynamics simulation software such as LAMMPS, OpenMM, and GROMACS.

By utilizing high-performance computing and machine learning, DeePMD-kit has greatly extended the limit of molecular dynamics simulations by several orders of magnitude, reaching system scales with billions of atoms, while still ensuring high accuracy of "ab initio" calculations. The simulation time scale is at least 1000 times higher than traditional methods. The related achievements won the 2020 ACM Gordon Bell Prize, the highest award in the field of high-performance computing, and have been used by thousands of research groups in physics, chemistry, materials, biology, and other fields.

Fig1

For more detailed usage, you can refer to the documentation of DeePMD-kit as a complete reference.

In this case, the Deep Potential (DP) model was generated using the DeePMD-kit package (v2.2.1).

代码
文本

Practice

代码
文本

1 Data Preparation

We have prepared the initial data for running DeePMD-kit calculation on for you, and it is located in the folder DeePMD-kit_Tutorial. You can click on the dataset on the left to view the corresponding files.

代码
文本
[1]
# For safety reasons, we do not have write access to the folder where the dataset is located,
# so we copy it to the `/personal/` directory:
! cp -nr /bohr/ /personal/

# Here we define some paths and switch to the working path for subsequent calls:
import os
bohr_dataset_url = "/bohr/deepmd-kit-8n4p/v1/" # The URL can be copied from the dataset on the left
work_path = os.path.join("/personal", bohr_dataset_url[1:]) # Perform a slice to remove the initial "/" in the above path
os.chdir(work_path)
print(f"The current path is: {os.getcwd()}")
The current path is: /personal/bohr/deepmd-kit-8n4p/v1
代码
文本

Let's take a look at the downloaded DeePMD-kit_Tutorial folder.

代码
文本
[2]
! tree DeePMD-kit_Tutorial -L 1
DeePMD-kit_Tutorial
├── 00.data
├── 01.train
└── 02.lmp

3 directories, 0 files
代码
文本

There are three sub-folders in the DeePMD-kit_Tutorial folder: 00.data, 01.train, and 02.lmp.

  • The 00.data folder is used to store training and testing data.
  • The 01.train folder contains example scripts for training models using DeePMD-kit.
  • The 02.lmp folder contains example scripts for molecular dynamics simulations using LAMMPS.

Let's first take a look at the DeePMD-kit_Tutorial/00.data folder.

代码
文本
[3]
! tree DeePMD-kit_Tutorial/00.data -L 1
DeePMD-kit_Tutorial/00.data
├── abacus_md
├── training_data
└── validation_data

3 directories, 0 files
代码
文本

The training data of DeePMD-kit comes from first-principles calculation data, including atomic types, simulation lattice, atomic coordinates, atomic forces, system energy, and virial quantity.

image-20230116161737203

Under the 00.data folder, there is only the abacus_md folder, which is obtained by using ABACUS to perform ab initio Molecular Dynamics (AIMD). In this tutorial, we have already completed the ab initio molecular dynamics calculation of methane molecules for you.

Detailed explanations about the ABACUS can be found in its documentation. You can also find help in Section 2 of In-depth "Deep Potential" Material Calculation Guide.

DeePMD-kit adopts a compressed data format. All training data should be first converted to this format before being used in DeePMD-kit. The data format is explained in detail in the DeePMD-kit manual, which can be found on DeePMD-kit Github.

We provide a convenient tool called dpdata to convert data generated by VASP, CP2K, Gaussian, Quantum-Espresso, ABACUS, and LAMMPS into the compressed format of DeePMD-kit.

A snapshot of a molecular system with computational data information is called a frame. The data system consists of many frames that share the same number of atoms and atomic types.

For example, a molecular dynamics trajectory can be converted into a data system, where each time step corresponds to a frame in the system.

代码
文本

Next, we use the dpdata tool to randomly split the data in abacus_md into training and validation data.

代码
文本
[ ]
import dpdata
help(dpdata.LabeledSystem)
代码
文本
[4]
import dpdata
import numpy as np

# Read data in ABACUS/MD format
data = dpdata.LabeledSystem('DeePMD-kit_Tutorial/00.data/abacus_md', fmt = 'abacus/md')
print('# The data contains %d frames' % len(data))

# Randomly select 40 indices as validation data
index_validation = np.random.choice(201, size=40, replace=False)

# Other indices as training data
index_training = list(set(range(201)) - set(index_validation))
data_training = data.sub_system(index_training)
data_validation = data.sub_system(index_validation)

# Put all training data into the folder "training_data"
data_training.to_deepmd_npy('DeePMD-kit_Tutorial/00.data/training_data')

# Put all validation data into the folder "validation_data"
data_validation.to_deepmd_npy('DeePMD-kit_Tutorial/00.data/validation_data')

print('# The training data contains %d frames' % len(data_training))
print('# The validation data contains %d frames' % len(data_validation))
# The data contains 201 frames
# The training data contains 161 frames
# The validation data contains 40 frames
代码
文本

You can see that 161 frames were selected as training data, and the other 40 frames are validation data.


Let's take another look at the 00.data folder, where new files have been generated. These files are the training set and validation set required for DeePMD-kit deep potential training.

代码
文本
[7]
! tree DeePMD-kit_Tutorial/00.data/ -L 1
DeePMD-kit_Tutorial/00.data/
├── abacus_md
├── training_data
└── validation_data

3 directories, 0 files
代码
文本
[8]
! tree DeePMD-kit_Tutorial/00.data/training_data -L 1
DeePMD-kit_Tutorial/00.data/training_data
├── set.000
├── type.raw
└── type_map.raw

1 directory, 2 files
代码
文本

The roles of these files are as follows:

  1. set.000: It is a directory that contains data in a compressed format (NumPy compressed arrays).
  2. type.raw: It is a file that contains the types of atoms (represented by integers).
  3. type_map.raw: It is a file that contains the names of the atom types.
代码
文本
[9]
! cat DeePMD-kit_Tutorial/00.data/training_data/type.raw
0
0
0
0
1
代码
文本

This tells us that there are 5 atoms in this example, with 4 atoms represented by the type "0" and 1 atom represented by the type "1". Sometimes it is necessary to map integer types to atom names. The mapping can be provided through the file type_map.raw.

代码
文本
[10]
! cat DeePMD-kit_Tutorial/00.data/training_data/type_map.raw
H
C
代码
文本

This tells us that the type "0" is named "H" and the type "1" is named "C".

The detailed documentation for using dpdata for data conversion can be accessed HERE.

代码
文本

2 Preparing the Input Script

Once the training data is prepared, the next step is to proceed with the training. DeePMD-kit requires a json formatted file to specify the training parameters. This file is known as the input script for DeePMD-kit. Let's navigate to the training directory to examine this input script:

代码
文本
[11]
! cd DeePMD-kit_Tutorial/01.train/ && cat input.json
{
    "_comment": " model parameters",
    "model": {
	"type_map":	["H", "C"],
	"descriptor" :{
	    "type":		"se_e2_a",
	    "sel":		"auto",
	    "rcut_smth":	0.50,
	    "rcut":		6.00,
	    "neuron":		[25, 50, 100],
	    "resnet_dt":	false,
	    "axis_neuron":	16,
	    "seed":		1,
	    "_comment":		" that's all"
	},
	"fitting_net" : {
	    "neuron":		[240, 240, 240],
	    "resnet_dt":	true,
	    "seed":		1,
	    "_comment":		" that's all"
	},
	"_comment":	" that's all"
    },

    "learning_rate" :{
	"type":		"exp",
	"decay_steps":	50,
	"start_lr":	0.001,	
	"stop_lr":	3.51e-8,
	"_comment":	"that's all"
    },

    "loss" :{
	"type":		"ener",
	"start_pref_e":	0.02,
	"limit_pref_e":	1,
	"start_pref_f":	1000,
	"limit_pref_f":	1,
	"start_pref_v":	0,
	"limit_pref_v":	0,
	"_comment":	" that's all"
    },

    "training" : {
	"training_data": {
	    "systems":     ["../00.data/training_data"],
	    "batch_size":  "auto",
	    "_comment":	   "that's all"
	},
	"validation_data":{
	    "systems":	   ["../00.data/validation_data"],
	    "batch_size":  "auto",
	    "numb_btch":   1,
	    "_comment":	   "that's all"
	},
	"numb_steps":	10000,
	"seed":		10,
	"disp_file":	"lcurve.out",
	"disp_freq":	200,
	"save_freq":	1000,
	"_comment":	"that's all"
    },    

    "_comment":		"that's all"
}

代码
文本

In the model section, parameters for the embedding and fitting networks are specified.

"model":{
    "type_map":    ["H", "C"],                 
    "descriptor":{
        "type":            "se_e2_a",          
        "rcut":            6.00,               
        "rcut_smth":       0.50,               
        "sel":             "auto",             
        "neuron":          [25, 50, 100],       
        "resnet_dt":       false,
        "axis_neuron":     16,                  
        "seed":            1,
        "_comment":        "that's all"
        },
    "fitting_net":{
        "neuron":          [240, 240, 240],    
        "resnet_dt":       true,
        "seed":            1,
        "_comment":        "that's all"
    },
    "_comment":    "that's all"'
},

Some of the parameters are explained as follows:

Parameter Explanation
type_map The name of each type of atom
descriptor > type The type of descriptor
descriptor > rcut The cutoff radius
descriptor > rcut_smth The position where smoothing begins
descriptor > sel The maximum number of the i-th type of atom within the cutoff radius
descriptor > neuron The size of the embedding neural network
descriptor > axis_neuron The size of the G-matrix's submatrix (embedding matrix)
fitting_net > neuron The size of the fitting neural network

Use the se_e2_a descriptor to train the DP model. The neurons parameter sets the sizes of the descriptor and fitting network to [25, 50, 100] and [240, 240, 240], respectively. The components in the local environment will smoothly approach zero within the range of 0.5 to 6 Å.

The following are the parameters specifying the learning rate and loss function.

    "learning_rate" :{
        "type":                "exp",
        "decay_steps":         50,
        "start_lr":            0.001,    
        "stop_lr":             3.51e-8,
        "_comment":            "that's all"
    },
    "loss" :{
        "type":                "ener",
        "start_pref_e":        0.02,
        "limit_pref_e":        1,
        "start_pref_f":        1000,
        "limit_pref_f":        1,
        "start_pref_v":        0,
        "limit_pref_v":        0,
        "_comment":            "that's all"
    },

In the loss function, pref_e gradually increases from 0.02 to 1, and pref_f gradually decreases from 1000 to 1, which means that the force term dominates at the beginning, while the energy and pressure terms become important at the end. This strategy is very effective and reduces the total training time. pref_v is set to 0, indicating that pressure data is not included during the training process. The initial learning rate, final learning rate, and decay steps are set to 0.001, 3.51e-8, and 50, respectively. The model is trained for 10000 steps.

The training parameters are shown as follows:

    "training" : {
        "training_data": {
            "systems":            ["../00.data/training_data"],     
            "batch_size":         "auto",                       
            "_comment":           "that's all"
        },
        "validation_data":{
            "systems":            ["../00.data/validation_data/"],
            "batch_size":         "auto",               
            "numb_btch":          1,
            "_comment":           "that's all"
        },
        "numb_steps":             10000,                           
        "seed":                   10,
        "disp_file":              "lcurve.out",
        "disp_freq":              200,
        "save_freq":              10000,
        },
代码
文本

3 Training Model

With the training script ready, we can start the training by simply running DeePMD-kit.

代码
文本
[5]
# ########## Time Warning: 8 mins 48 secs ##########
! cd DeePMD-kit_Tutorial/01.train/ && dp train input.json
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
  _bootstrap._exec(spec, module)
DEEPMD INFO    Calculate neighbor statistics... (add --skip-neighbor-stat to skip this step)
2024-07-10 02:20:00.763483: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2024-07-10 02:20:00.763528: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0,1
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 2 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "socket".
OMP: Info #192: KMP_AFFINITY: 1 socket x 1 core/socket x 2 threads/core (1 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1 
OMP: Info #254: KMP_AFFINITY: pid 99 tid 109 thread 1 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 99 tid 111 thread 2 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 99 tid 108 thread 3 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 99 tid 112 thread 4 bound to OS proc set 0
DEEPMD INFO    training data with min nbor dist: 1.0460506586976834
DEEPMD INFO    training data with max nbor size: [4 1]
DEEPMD INFO     _____               _____   __  __  _____           _     _  _   
DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |  
DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_ 
DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_ 
DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
DEEPMD INFO    Please read and cite:
DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO    installed to:         /home/conda/feedstock_root/build_artifacts/deepmd-kit_1678943793317/work/_skbuild/linux-x86_64-3.10/cmake-install
DEEPMD INFO    source :              v2.2.1
DEEPMD INFO    source brach:         HEAD
DEEPMD INFO    source commit:        3ac8c4c7
DEEPMD INFO    source commit at:     2023-03-16 12:33:24 +0800
DEEPMD INFO    build float prec:     double
DEEPMD INFO    build variant:        cuda
DEEPMD INFO    build with tf inc:    /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/include;/opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/../../../../include
DEEPMD INFO    build with tf lib:    
DEEPMD INFO    ---Summary of the training---------------------------------------
DEEPMD INFO    running on:           bohrium-16664-1159987
DEEPMD INFO    computing device:     cpu:0
DEEPMD INFO    CUDA_VISIBLE_DEVICES: unset
DEEPMD INFO    Count of visible GPU: 0
DEEPMD INFO    num_intra_threads:    0
DEEPMD INFO    num_inter_threads:    0
DEEPMD INFO    -----------------------------------------------------------------
DEEPMD INFO    ---Summary of DataSystem: training     -----------------------------------------------
DEEPMD INFO    found 1 system(s):
DEEPMD INFO                                        system  natoms  bch_sz   n_bch   prob  pbc
DEEPMD INFO                      ../00.data/training_data       5       7      23  1.000    T
DEEPMD INFO    --------------------------------------------------------------------------------------
DEEPMD INFO    ---Summary of DataSystem: validation   -----------------------------------------------
DEEPMD INFO    found 1 system(s):
DEEPMD INFO                                        system  natoms  bch_sz   n_bch   prob  pbc
DEEPMD INFO                    ../00.data/validation_data       5       7       5  1.000    T
DEEPMD INFO    --------------------------------------------------------------------------------------
DEEPMD INFO    training without frame parameter
DEEPMD INFO    data stating... (this step may take long time)
OMP: Info #254: KMP_AFFINITY: pid 99 tid 99 thread 0 bound to OS proc set 0
DEEPMD INFO    built lr
DEEPMD INFO    built network
DEEPMD INFO    built training
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
DEEPMD INFO    initialize model from scratch
DEEPMD INFO    start training at lr 1.00e-03 (== 1.00e-03), decay_step 50, decay_rate 0.950006, final lr will be 3.51e-08
DEEPMD INFO    batch     200 training time 11.36 s, testing time 0.04 s
DEEPMD INFO    batch     400 training time 10.01 s, testing time 0.04 s
DEEPMD INFO    batch     600 training time 9.97 s, testing time 0.03 s
DEEPMD INFO    batch     800 training time 10.01 s, testing time 0.04 s
DEEPMD INFO    batch    1000 training time 10.07 s, testing time 0.04 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    1200 training time 10.15 s, testing time 0.04 s
DEEPMD INFO    batch    1400 training time 10.05 s, testing time 0.04 s
DEEPMD INFO    batch    1600 training time 10.13 s, testing time 0.04 s
DEEPMD INFO    batch    1800 training time 10.06 s, testing time 0.04 s
DEEPMD INFO    batch    2000 training time 10.02 s, testing time 0.04 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    2200 training time 10.12 s, testing time 0.04 s
DEEPMD INFO    batch    2400 training time 10.18 s, testing time 0.04 s
DEEPMD INFO    batch    2600 training time 10.01 s, testing time 0.04 s
DEEPMD INFO    batch    2800 training time 10.06 s, testing time 0.04 s
DEEPMD INFO    batch    3000 training time 10.05 s, testing time 0.04 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    3200 training time 10.02 s, testing time 0.03 s
DEEPMD INFO    batch    3400 training time 10.10 s, testing time 0.05 s
DEEPMD INFO    batch    3600 training time 10.08 s, testing time 0.04 s
DEEPMD INFO    batch    3800 training time 10.05 s, testing time 0.03 s
DEEPMD INFO    batch    4000 training time 10.11 s, testing time 0.03 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    4200 training time 9.99 s, testing time 0.04 s
DEEPMD INFO    batch    4400 training time 10.01 s, testing time 0.04 s
DEEPMD INFO    batch    4600 training time 10.00 s, testing time 0.03 s
DEEPMD INFO    batch    4800 training time 9.99 s, testing time 0.04 s
DEEPMD INFO    batch    5000 training time 9.98 s, testing time 0.04 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    5200 training time 10.01 s, testing time 0.04 s
DEEPMD INFO    batch    5400 training time 9.99 s, testing time 0.04 s
DEEPMD INFO    batch    5600 training time 9.96 s, testing time 0.03 s
DEEPMD INFO    batch    5800 training time 10.06 s, testing time 0.04 s
DEEPMD INFO    batch    6000 training time 10.04 s, testing time 0.03 s
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/training/saver.py:1066: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/training/saver.py:1066: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    6200 training time 10.08 s, testing time 0.04 s
DEEPMD INFO    batch    6400 training time 10.10 s, testing time 0.03 s
DEEPMD INFO    batch    6600 training time 10.10 s, testing time 0.04 s
DEEPMD INFO    batch    6800 training time 10.12 s, testing time 0.04 s
DEEPMD INFO    batch    7000 training time 10.18 s, testing time 0.04 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    7200 training time 10.06 s, testing time 0.04 s
DEEPMD INFO    batch    7400 training time 10.08 s, testing time 0.04 s
DEEPMD INFO    batch    7600 training time 10.10 s, testing time 0.04 s
DEEPMD INFO    batch    7800 training time 10.06 s, testing time 0.04 s
DEEPMD INFO    batch    8000 training time 10.13 s, testing time 0.04 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    8200 training time 10.16 s, testing time 0.03 s
DEEPMD INFO    batch    8400 training time 10.04 s, testing time 0.03 s
DEEPMD INFO    batch    8600 training time 10.06 s, testing time 0.04 s
DEEPMD INFO    batch    8800 training time 10.13 s, testing time 0.04 s
DEEPMD INFO    batch    9000 training time 10.08 s, testing time 0.04 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    9200 training time 10.19 s, testing time 0.04 s
DEEPMD INFO    batch    9400 training time 10.08 s, testing time 0.04 s
DEEPMD INFO    batch    9600 training time 10.06 s, testing time 0.04 s
DEEPMD INFO    batch    9800 training time 10.09 s, testing time 0.04 s
DEEPMD INFO    batch   10000 training time 10.08 s, testing time 0.04 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    average training time: 0.0503 s/batch (exclude first 200 batches)
DEEPMD INFO    finished training
DEEPMD INFO    wall time: 520.318 s
代码
文本

The information from the data system will be displayed on the screen, for example:

DEEPMD INFO    -----------------------------------------------------------------
DEEPMD INFO    ---Summary of DataSystem: training     ----------------------------------
DEEPMD INFO    found 1 system(s):
DEEPMD INFO                                 system  natoms  bch_sz   n_bch   prob  pbc
DEEPMD INFO               ../00.data/training_data       5       7      23  1.000    T
DEEPMD INFO    -------------------------------------------------------------------------
DEEPMD INFO    ---Summary of DataSystem: validation   ----------------------------------
DEEPMD INFO    found 1 system(s):
DEEPMD INFO                                 system  natoms  bch_sz   n_bch   prob  pbc
DEEPMD INFO             ../00.data/validation_data       5       7       5  1.000    T
DEEPMD INFO    -------------------------------------------------------------------------

And the initial and final learning rates for this training:

DEEPMD INFO    start training at lr 1.00e-03 (== 1.00e-03), decay_step 50, decay_rate 0.950006, final lr will be 3.51e-08

If everything goes well, you will see information printed every 200 batches, for example:

DEEPMD INFO    batch     200 training time 6.04 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 4.80 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 4.80 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 4.78 s, testing time 0.02 s
DEEPMD INFO    batch    1000 training time 4.77 s, testing time 0.02 s
DEEPMD INFO    saved checkpoint model.ckpt
DEEPMD INFO    batch    1200 training time 4.47 s, testing time 0.02 s
DEEPMD INFO    batch    1400 training time 4.49 s, testing time 0.02 s
DEEPMD INFO    batch    1600 training time 4.45 s, testing time 0.02 s
DEEPMD INFO    batch    1800 training time 4.44 s, testing time 0.02 s
DEEPMD INFO    batch    2000 training time 4.46 s, testing time 0.02 s
DEEPMD INFO    saved checkpoint model.ckpt

They display the count of training and testing times. At the end of every 1000 batches, the model will be saved in the Tensorflow checkpoint file model.ckpt.

At the same time, the training and testing errors will be presented in the file lcurve.out. This file contains 8 columns, from left to right they are:

  1. Training step
  2. Validation loss
  3. Training loss
  4. Root mean square (RMS) validation error of energy
  5. RMS training error of energy
  6. RMS validation error of force
  7. RMS training error of force
  8. Learning rate

Learning rate is an important concept in machine learning. In the DP model, the learning rate undergoes an exponential decay process from large to small. This ensures both the efficiency of model convergence and the accuracy of the model. Therefore, in the learning rate parameters, there are two kinds: the initial learning rate (start_lr) and the final learning rate (end_rate). In the example above, we set the initial learning rate, final learning rate, and the decay step of the learning rate to 0.001, 3.51e-8, and 50, respectively. So the model learning rate will start from 0.001, decrease a bit every 50 steps, until it decreases to 3.51e-8 (or until the training ends).

代码
文本

Let's take a look at the beginning and end lines of the lcurve.out file.

代码
文本
[6]
! cd DeePMD-kit_Tutorial/01.train/ && head -n 2 lcurve.out && tail -n 2 lcurve.out
#  step      rmse_val    rmse_trn    rmse_e_val  rmse_e_trn    rmse_f_val  rmse_f_trn         lr
      0      1.98e+01    1.93e+01      1.38e-01    1.36e-01      6.26e-01    6.09e-01    1.0e-03
   9800      4.68e-02    5.20e-02      9.18e-04    7.10e-04      4.57e-02    5.09e-02    4.3e-08
  10000      5.53e-02    4.32e-02      7.53e-04    7.82e-04      5.44e-02    4.25e-02    3.5e-08
代码
文本

The loss function can be visualized to monitor the training process.

代码
文本
[7]
!/opt/mamba/bin/pip3 install matplotlib
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: matplotlib in /opt/mamba/lib/python3.10/site-packages (3.7.1)
Requirement already satisfied: python-dateutil>=2.7 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: cycler>=0.10 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: packaging>=20.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (23.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (3.0.9)
Requirement already satisfied: fonttools>=4.22.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (4.39.3)
Requirement already satisfied: numpy>=1.20 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.24.2)
Requirement already satisfied: pillow>=6.2.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (9.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.0.7)
Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
代码
文本
[8]
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

with open("./DeePMD-kit_Tutorial/01.train/lcurve.out") as f:
headers = f.readline().split()[1:]
lcurve = pd.DataFrame(np.loadtxt("./DeePMD-kit_Tutorial/01.train/lcurve.out"), columns=headers)
legends = ["rmse_e_val", "rmse_e_trn", "rmse_f_val" , "rmse_f_trn" ]

for legend in legends:
plt.loglog(lcurve["step"], lcurve[legend], label = legend )
plt.legend()
plt.xlabel("Training steps")
plt.ylabel("Loss")
plt.show()
代码
文本

4 Freezing the model

At the end of training, the model parameters saved in TensorFlow's checkpoint file should be frozen into a model file, which usually ends with the extension.pb. Simply run the following command:

代码
文本
[9]
## navigate in DeePMD-kit_Tutorial/01.train/ directory and freeze the model
! cd DeePMD-kit_Tutorial/01.train/ && dp freeze -o graph.pb
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
  _bootstrap._exec(spec, module)
2024-07-10 02:29:37.570599: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2024-07-10 02:29:37.570641: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
DEEPMD INFO    The following nodes will be frozen: ['model_type', 'descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'train_attr/min_nbor_dist', 'train_attr/training_script', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam']
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
DEEPMD INFO    1211 ops in the final graph.
代码
文本

It will output a model file named graph.pb in the current directory.

代码
文本

So far, we have obtained a deep potential energy model obtained through high-precision ab initio molecular dynamics data using DeePMD-kit: DeePMD-kit_Tutorial/01.train/graph.pb

代码
文本

5 Compressing the Model

Compressing a DP model typically increases the computational speed based on DP by an order of magnitude and consumes less memory. The graph.pb can be compressed as follows:

代码
文本
[11]
## Navigate to the DeePMD-kit_Tutorial/01.train/ Directory to Compress the Model
! cd DeePMD-kit_Tutorial/01.train/ && dp compress -i graph.pb -o compress.pb
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
  _bootstrap._exec(spec, module)
2024-07-10 02:37:20.748939: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2024-07-10 02:37:20.748999: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
DEEPMD INFO    


DEEPMD INFO    stage 1: compress the model
DEEPMD INFO     _____               _____   __  __  _____           _     _  _   
DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |  
DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_ 
DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_ 
DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
DEEPMD INFO    Please read and cite:
DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO    installed to:         /home/conda/feedstock_root/build_artifacts/deepmd-kit_1678943793317/work/_skbuild/linux-x86_64-3.10/cmake-install
DEEPMD INFO    source :              v2.2.1
DEEPMD INFO    source brach:         HEAD
DEEPMD INFO    source commit:        3ac8c4c7
DEEPMD INFO    source commit at:     2023-03-16 12:33:24 +0800
DEEPMD INFO    build float prec:     double
DEEPMD INFO    build variant:        cuda
DEEPMD INFO    build with tf inc:    /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/include;/opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/../../../../include
DEEPMD INFO    build with tf lib:    
DEEPMD INFO    ---Summary of the training---------------------------------------
DEEPMD INFO    running on:           bohrium-16664-1159987
DEEPMD INFO    computing device:     cpu:0
DEEPMD INFO    CUDA_VISIBLE_DEVICES: unset
DEEPMD INFO    Count of visible GPU: 0
DEEPMD INFO    num_intra_threads:    0
DEEPMD INFO    num_inter_threads:    0
DEEPMD INFO    -----------------------------------------------------------------
DEEPMD INFO    training without frame parameter
DEEPMD INFO    training data with lower boundary: [-0.9291997  -0.99961742]
DEEPMD INFO    training data with upper boundary: [1.97524068 1.1051857 ]
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0,1
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 2 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "socket".
OMP: Info #192: KMP_AFFINITY: 1 socket x 1 core/socket x 2 threads/core (1 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1 
OMP: Info #254: KMP_AFFINITY: pid 179 tid 179 thread 0 bound to OS proc set 0
DEEPMD INFO    built lr
DEEPMD INFO    built network
DEEPMD INFO    built training
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
DEEPMD INFO    initialize model from scratch
DEEPMD INFO    finished compressing
DEEPMD INFO    


DEEPMD INFO    stage 2: freeze the model
DEEPMD INFO    The following nodes will be frozen: ['model_type', 'descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'train_attr/min_nbor_dist', 'train_attr/training_script', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam']
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
DEEPMD INFO    847 ops in the final graph.
代码
文本

6 Testing the Model

Let's check the performance of the trained model:

代码
文本
[12]
! cd DeePMD-kit_Tutorial/01.train/ && dp test -m graph.pb -s ../00.data/validation_data
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
  _bootstrap._exec(spec, module)
2024-07-10 02:37:51.740912: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2024-07-10 02:37:51.740961: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
DEEPMD WARNING You can use the environment variable DP_INFER_BATCH_SIZE tocontrol the inference batch size (nframes * natoms). The default value is 1024.
DEEPMD INFO    # ---------------output of dp test--------------- 
DEEPMD INFO    # testing system : ../00.data/validation_data
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0,1
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 2 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "socket".
OMP: Info #192: KMP_AFFINITY: 1 socket x 1 core/socket x 2 threads/core (1 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1 
OMP: Info #254: KMP_AFFINITY: pid 192 tid 196 thread 1 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 192 tid 199 thread 2 bound to OS proc set 0
DEEPMD INFO    # number of test data : 40 
DEEPMD INFO    Energy MAE         : 3.360696e-03 eV
DEEPMD INFO    Energy RMSE        : 4.048368e-03 eV
DEEPMD INFO    Energy MAE/Natoms  : 6.721391e-04 eV
DEEPMD INFO    Energy RMSE/Natoms : 8.096737e-04 eV
DEEPMD INFO    Force  MAE         : 3.785041e-02 eV/A
DEEPMD INFO    Force  RMSE        : 4.851145e-02 eV/A
DEEPMD INFO    Virial MAE         : 5.403371e-02 eV
DEEPMD INFO    Virial RMSE        : 6.898294e-02 eV
DEEPMD INFO    Virial MAE/Natoms  : 1.080674e-02 eV
DEEPMD INFO    Virial RMSE/Natoms : 1.379659e-02 eV
DEEPMD INFO    # ----------------------------------------------- 
代码
文本

Let's calculate the correlation between the predicted data and the original data and visualize it.

代码
文本
[13]
import dpdata

training_systems = dpdata.LabeledSystem("./DeePMD-kit_Tutorial/00.data/training_data", fmt = "deepmd/npy") # Get the training data points
predict = training_systems.predict("./DeePMD-kit_Tutorial/01.train/graph.pb") # Get the predicted data points
2024-07-10 02:38:35.031947: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-10 02:38:37.254219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2024-07-10 02:38:37.255107: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2024-07-10 02:38:37.255123: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:tensorflow:From /opt/mamba/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
WARNING:tensorflow:From /opt/mamba/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2024-07-10 02:38:39.548987: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-10 02:38:39.551735: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2024-07-10 02:38:39.551776: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2024-07-10 02:38:39.551799: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (bohrium-16664-1159987): /proc/driver/nvidia/version does not exist
2024-07-10 02:38:39.569416: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled
WARNING:tensorflow:From /opt/mamba/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
WARNING:deepmd.utils.batch_size:You can use the environment variable DP_INFER_BATCH_SIZE tocontrol the inference batch size (nframes * natoms). The default value is 1024.
代码
文本
[14]
import matplotlib.pyplot as plt
import numpy as np

plt.scatter(training_systems["energies"], predict["energies"])

x_range = np.linspace(plt.xlim()[0], plt.xlim()[1])

plt.plot(x_range, x_range, "r--", linewidth = 0.25)
plt.xlabel("Energy of DFT") # Set the X-axis title
plt.ylabel("Energy predicted by deep potential") # Set the Y-axis title
plt.show()
代码
文本

7 MD calculation with LAMMPS

This model can drive molecular dynamics simulations in LAMMPS.

代码
文本
[15]
! cd ./DeePMD-kit_Tutorial/02.lmp && cp ../01.train/graph.pb ./ && tree -L 1
.
├── conf.lmp
├── graph.pb
└── in.lammps

0 directories, 3 files
代码
文本

Here conf.lmp gives the initial configuration for the molecular dynamics simulation of gas phase methane.

代码
文本

The file in.lammps is an input script for LAMMPS. You can check in.lammps and find that it is quite a standard LAMMPS molecular dynamics simulation input file (for more information on LAMMPS molecular dynamics simulation input files, you can read 「A Very Detailed Guide to "Deep Potential" Material Calculations | Chapter 1」).

There are only two exceptions:

pair_style  deepmd graph.pb
pair_coeff  * *

Here, the pair_style of DeePMD is called, providing the model file graph.pb, which means that the interatomic interactions will be calculated by the DP model stored in the file graph.pb.

In an environment with a compatible version of LAMMPS, deep potential molecular dynamics simulation can be executed with the following command:

代码
文本
[16]
! cd ./DeePMD-kit_Tutorial/02.lmp && lmp -i in.lammps
Warning:
This LAMMPS executable is in a conda environment, but the environment has
not been activated. Libraries may fail to load. To activate this environment
please see https://conda.io/activation.
LAMMPS (23 Jun 2022 - Update 1)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
Loaded 1 plugins from /opt/deepmd-kit-2.2.1/lib/deepmd_lmp
Reading data file ...
  triclinic box = (0 0 0) to (10.114259 10.263124 10.216793) with tilt (0.036749877 0.13833062 -0.056322169)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  5 atoms
  read_data CPU = 0.012 seconds
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Summary of lammps deepmd module ...
  >>> Info of deepmd-kit:
  installed to:       /opt/deepmd-kit-2.2.1
  source:             v2.2.1
  source branch:       HEAD
  source commit:      3ac8c4c7
  source commit at:   2023-03-16 12:33:24 +0800
  surpport model ver.:1.1 
  build variant:      cuda
  build with tf inc:  /opt/deepmd-kit-2.2.1/include;/opt/deepmd-kit-2.2.1/include
  build with tf lib:  /opt/deepmd-kit-2.2.1/lib/libtensorflow_cc.so
  set tf intra_op_parallelism_threads: 0
  set tf inter_op_parallelism_threads: 0
  >>> Info of lammps module:
  use deepmd-kit at:  /opt/deepmd-kit-2.2.1DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit: Successfully load libcudart.so
2024-07-10 02:40:48.199790: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-10 02:40:48.205094: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2024-07-10 02:40:48.205124: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2024-07-10 02:40:48.205149: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (bohrium-16664-1159987): /proc/driver/nvidia/version does not exist
2024-07-10 02:40:48.206778: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2024-07-10 02:40:48.259249: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
  >>> Info of model(s):
  using   1 model(s): graph.pb 
  rcut in model:      6
  ntypes in model:    2

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update every 10 steps, delay 0 steps, check no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 7
  ghost atom cutoff = 7
  binsize = 3.5, bins = 3 3 3
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair deepmd, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.001
Per MPI rank memory allocation (min/avg/max) = 3.809 | 3.809 | 3.809 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
         0  -219.77031      0.025852029   -219.74446      50            -822.49232      1060.5429    
       100  -219.76464      0.020095149   -219.74455      38.865709     -779.63202      1060.5429    
       200  -219.77378      0.027782066   -219.746        53.732854     -647.97367      1060.5429    
       300  -219.77702      0.029854226   -219.74717      57.740586     -350.10302      1060.5429    
       400  -219.78434      0.035056028   -219.74929      67.801309      113.88001      1060.5429    
       500  -219.78313      0.031576851   -219.75156      61.072287      580.6186       1060.5429    
       600  -219.7756       0.021923343   -219.75368      42.40159       812.3297       1060.5429    
       700  -219.78108      0.023820451   -219.75726      46.070758      732.91804      1060.5429    
       800  -219.78308      0.021974712   -219.7611       42.500943      498.86378      1060.5429    
       900  -219.78223      0.017310548   -219.76492      33.480058      226.54755      1060.5429    
      1000  -219.7867       0.018039368   -219.76866      34.889656     -99.405242      1060.5429    
      1100  -219.78739      0.015950019   -219.77144      30.84868      -368.47626      1060.5429    
      1200  -219.78804      0.014223823   -219.77382      27.510071     -538.23678      1060.5429    
      1300  -219.78442      0.009168641   -219.77525      17.732923     -554.96154      1060.5429    
      1400  -219.78821      0.011720654   -219.77649      22.668731     -465.29996      1060.5429    
      1500  -219.78975      0.012687905   -219.77706      24.539476     -279.10681      1060.5429    
      1600  -219.78827      0.011200658   -219.77707      21.663015     -29.13483       1060.5429    
      1700  -219.78642      0.010006422   -219.77641      19.353263      243.6751       1060.5429    
      1800  -219.78579      0.010906011   -219.77488      21.093144      490.65047      1060.5429    
      1900  -219.78573      0.013373885   -219.77236      25.866219      624.00024      1060.5429    
      2000  -219.78225      0.014287484   -219.76796      27.633197      619.02662      1060.5429    
      2100  -219.78467      0.022521156   -219.76215      43.557812      442.21135      1060.5429    
      2200  -219.78438      0.028823738   -219.75555      55.747536      175.23192      1060.5429    
      2300  -219.78501      0.034010184   -219.751        65.778558     -188.90178      1060.5429    
      2400  -219.77587      0.028191148   -219.74768      54.524054     -557.11735      1060.5429    
      2500  -219.77369      0.027810336   -219.74588      53.78753      -781.51594      1060.5429    
      2600  -219.78217      0.036716367   -219.74545      71.012545     -762.48742      1060.5429    
      2700  -219.77923      0.034033695   -219.7452       65.82403      -522.27277      1060.5429    
      2800  -219.77753      0.031754385   -219.74577      61.415653     -232.12544      1060.5429    
      2900  -219.77919      0.032601519   -219.74659      63.054081      61.069617      1060.5429    
      3000  -219.78067      0.033007232   -219.74766      63.838765      391.91055      1060.5429    
      3100  -219.77437      0.025678312   -219.7487       49.664016      701.85643      1060.5429    
      3200  -219.77692      0.026349764   -219.75057      50.962661      844.2123       1060.5429    
      3300  -219.78329      0.030588548   -219.7527       59.160827      662.97194      1060.5429    
      3400  -219.78856      0.033237528   -219.75532      64.284177      293.15421      1060.5429    
      3500  -219.78095      0.02378893    -219.75716      46.009793     -83.926909      1060.5429    
      3600  -219.7769       0.017962176   -219.75893      34.740361     -406.01025      1060.5429    
      3700  -219.77813      0.017584279   -219.76054      34.009475     -627.44542      1060.5429    
      3800  -219.77929      0.017350478   -219.76194      33.557284     -715.87346      1060.5429    
      3900  -219.78244      0.019080186   -219.76336      36.902686     -603.04337      1060.5429    
      4000  -219.78846      0.023776304   -219.76468      45.985373     -327.24592      1060.5429    
      4100  -219.792        0.026501155   -219.7655       51.255464      17.488112      1060.5429    
      4200  -219.7892       0.023318115   -219.76589      45.099196      308.73311      1060.5429    
      4300  -219.77882      0.01343071    -219.76539      25.976124      550.0542       1060.5429    
      4400  -219.77641      0.011723519   -219.76468      22.674272      690.40336      1060.5429    
      4500  -219.77881      0.015309962   -219.7635       29.610755      639.09389      1060.5429    
      4600  -219.78066      0.019182412   -219.76147      37.1004        395.64924      1060.5429    
      4700  -219.78544      0.027123835   -219.75832      52.459779      24.35561       1060.5429    
      4800  -219.78352      0.030550771   -219.75297      59.087763     -347.77022      1060.5429    
      4900  -219.78209      0.035013865   -219.74708      67.719761     -675.27941      1060.5429    
      5000  -219.76129      0.021421171   -219.73987      41.430347     -841.05433      1060.5429    
Loop time of 18.2714 on 1 procs for 5000 steps with 5 atoms

Performance: 23.643 ns/day, 1.015 hours/ns, 273.651 timesteps/s
170.9% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 18.203     | 18.203     | 18.203     |   0.0 | 99.63
Neigh   | 0.0076709  | 0.0076709  | 0.0076709  |   0.0 |  0.04
Comm    | 0.017699   | 0.017699   | 0.017699   |   0.0 |  0.10
Output  | 0.0048833  | 0.0048833  | 0.0048833  |   0.0 |  0.03
Modify  | 0.029036   | 0.029036   | 0.029036   |   0.0 |  0.16
Other   |            | 0.00877    |            |       |  0.05

Nlocal:              5 ave           5 max           5 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:            130 ave         130 max         130 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:           20 ave          20 max          20 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 20
Ave neighs/atom = 4
Neighbor list builds = 500
Dangerous builds not checked
Total wall time: 0:00:19
代码
文本

This is a "Deep Potential Molecular Dynamics" (DeePMD-kit) quick start guide, which allows you to quickly understand the paradigm cycle of DeePMD-kit and apply it to your project.

The accompanying video report for this article can be found in the link below. Due to the settings of the video website, the display quality on this webpage may not be optimal, so it may be necessary to visit the original website for a clearer viewing experience.

代码
文本

Reference

  1. Han Wang, Linfeng Zhang, Jiequn Han, and Weinan E. DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics. Comput. Phys. Comm., 228:178–184, 2018. doi:10.1016/j.cpc.2018.03.016.

  2. Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, and Han Wang. DeePMD-kit v2: A software package for Deep Potential models. 2023. doi:10.48550/arXiv.2304.09409.

  3. https://docs.deepmodeling.com/projects/deepmd/en/master/index.html

  4. https://github.com/deepmodeling/deepmd-kit

代码
文本

Open In Bohrium

代码
文本
[1]
import matplotlib
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import matplotlib

ModuleNotFoundError: No module named 'matplotlib'
代码
文本
[2]
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

with open("./DeePMD-kit_Tutorial/01.train/lcurve.out") as f:
headers = f.readline().split()[1:]
lcurve = pd.DataFrame(np.loadtxt("./DeePMD-kit_Tutorial/01.train/lcurve.out"), columns=headers)
legends = ["rmse_e_val", "rmse_e_trn", "rmse_f_val" , "rmse_f_trn" ]

for legend in legends:
plt.loglog(lcurve["step"], lcurve[legend], label = legend )
plt.legend()
plt.xlabel("Training steps")
plt.ylabel("Loss")
plt.show()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 2
      1 import numpy as np
----> 2 import matplotlib.pyplot as plt
      3 import pandas as pd
      5 with open("./DeePMD-kit_Tutorial/01.train/lcurve.out") as f:

ModuleNotFoundError: No module named 'matplotlib'
代码
文本
[2]
!pip list | grep matplotlib
matplotlib              3.9.1
代码
文本
[1]
import matplotlib
print(matplotlib.__version__)
3.9.1
代码
文本
[4]
import sys
print(sys.executable)
/opt/mamba/bin/python3.10
代码
文本
English
DeePMD-kit
EnglishDeePMD-kit
已赞1
推荐阅读
公开
中国材料大会 2024 | 快速开始 DeePMD-kit|训练甲烷深度势能分子动力学模型副本
TutorialDeePMD-kit
TutorialDeePMD-kit
Mancn
更新于 2024-07-11
1 转存文件
公开
浙大暑期学校——快速开始 DeePMD-kit|训练甲烷深度势能分子动力学模型
TutorialDeePMD-kit
TutorialDeePMD-kit
Letian
更新于 2024-08-27
1 赞1 转存文件