DeePMD-kit Quick Start Tutorial
©️ Copyright 2024 @ Authors
📖 Getting Started Guide
Licensing Agreement: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This document can be executed directly on the Bohrium Notebook. To begin, click the Connect button located at the top of the interface. We have already set up the recommended image DeePMD-kit:2.2.1-cuda11.6-notebook and the recommended machine type c32_m64_cpu for you.
This is a quick start guide for "Deep Potential" molecular dynamics using DeePMD-kit, through which you can quickly understand the paradigm cycle that DeePMD-kit operates in and apply it to your projects.
Deep Potential is the convergence of machine learning and physical principles, presenting a new computational paradigm as shown in the figure below.
Figure | A new computational paradigm, composed of Molecular Modeling, Machine Learning, and High-Performance Computing (HPC).
Task
Mastering the paradigm cycle of using DeePMD-kit to establish deep potential molecular dynamics models, and following a complete case to learn how to apply it to molecular dynamics tasks.
By the end of this tutorial, you will be able to:
- Prepare the formataive dataset and running scripts for training with DeePMD-kit;
- Train, freeze, and test DeePMD-kit models;
- Use DeePMD-kit in Lammps for calculations;
Work through this tutorial. It will take you 20 minutes, max!
Background
In this tutorial, we will take the gaseous methane molecule as an example to provide a detailed introduction to the training and application of the Deep Potential (DP) model.
DeePMD-kit is a software tool that employs neural networks to fit potential energy models based on first-principles data for molecular dynamics simulations. Without manual intervention, it can end-to-end transform the data provided by users into a deep potential model in a matter of hours. This model can seamlessly integrate with common molecular dynamics simulation software (like LAMMPS, OpenMM, and GROMACS).
DeePMD-kit significantly elevates the limits of molecular dynamics through high-performance computing and machine learning, achieving system scales of up to hundreds of millions of atoms while still maintaining the high accuracy of "ab initio" calculations. The simulation time scale is improved by at least 1000 times compared to traditional methods. Its achievements earned the 2020 ACM Gordon Bell Prize, one of the highest honors in the field of high-performance computing, and it has been used by over a thousand research groups in physics, chemistry, materials science, biology, and other fields globally.
For more detailed usage, you can refer to the DeePMD-kit’s documentation as a comprehensive reference.
In this case, the Deep Potential (DP) model was generated using the DeePMD-kit package (v2.2.1).
Current path is: /personal/bohr/DeePMD-kit-Tutorial-21f1/v1
Let's take a look at the downloaded DeePMD-kit_Tutorial folder.
DeePMD-kit_Tutorial ├── 00.data ├── 01.train ├── 01.train.finished ├── 02.lmp └── 02.lmp.finished 5 directories, 0 files
There are 3 subfolders under the DeePMD-kit_Tutorial folder: 00.data, 01.train, and 02.lmp.
- The 00.data folder is used to store training and testing data.
- The 01.train folder contains example scripts for training models using DeePMD-kit.
- The 01.train.finished folder includes the complete results of the training process.
- The 02.lmp folder contains example scripts for molecular dynamics simulations using LAMMPS.
Let's first take a look at the DeePMD-kit_Tutorial/00.data folder.
DeePMD-kit_Tutorial/00.data ├── abacus_md ├── training_data └── validation_data 3 directories, 0 files
DeePMD-kit's training data originates from first-principles calculation data, including atomic types, simulation cells, atomic coordinates, atomic forces, system energies, and virials.
In the 00.data folder, there is only the abacus_md folder, which contains data obtained through ab initio Molecular Dynamics (AIMD) simulations using ABACUS. In this tutorial, we have already completed the ab initio molecular dynamics calculations for the methane molecule for you.
Detailed information about ABACUS can be found in its documentation.
DeePMD-kit uses a compressed data format. All training data should first be converted into this format before they can be used in DeePMD-kit. This data format is explained in detail in the DeePMD-kit manual, which can be found on DeePMD-kit's Github.
We provide a convenient tool dpdata, which can convert data generated by VASP, CP2K, Gaussian, Quantum Espresso, ABACUS, and LAMMPS into DeePMD-kit's compressed format.
A snapshot of a molecular system that contains computational data information is called a frame. A data system comprises many frames sharing the same number of atoms and atom types.
For example, a molecular dynamics trajectory can be converted into a data system, where each timestep corresponds to one frame in the system.
Next, we use the dpdata tool to randomly split the data in abacus_md into training and validation data.
# the data contains 201 frames # the training data contains 161 frames # the validation data contains 40 frames
As you can see, 161 frames are picked as training data, and the other 40 frames are validation dat.
Let's take another look at the 00.data folder, where new files have been generated, which are the training and validation sets required for Deep Potential training with DeePMD-kit.
DeePMD-kit_Tutorial/00.data/ ├── abacus_md ├── training_data └── validation_data 3 directories, 0 files
DeePMD-kit_Tutorial/00.data/training_data ├── set.000 ├── type.raw └── type_map.raw 1 directory, 2 files
The functions of these files are as follows:
- set.000: It is a directory that contains compressed format data (NumPy compressed arrays).
- type.raw: It is a file that contains the types of atoms (represented as integers).
- type_map.raw: It is a file that contains the names of the types of atoms.
Let's take a look at these files.
Let's have a look at type.raw
:
0 0 0 0 1
This tells us there are 5 atoms in this example, 4 atoms represented by type "0", and 1 atom represented by type "1".
Sometimes one needs to map the integer types to atom name. The mapping can be given by the file type_map.raw
H C
This tells us the type "0" is named by "H", and the type "1" is named by "C".
More detailed documentation on using dpdata for data conversion can be found here
{ "_comment": " model parameters", "model": { "type_map": ["H", "C"], "descriptor" :{ "type": "se_e2_a", "sel": "auto", "rcut_smth": 0.50, "rcut": 6.00, "neuron": [25, 50, 100], "resnet_dt": false, "axis_neuron": 16, "seed": 1, "_comment": " that's all" }, "fitting_net" : { "neuron": [240, 240, 240], "resnet_dt": true, "seed": 1, "_comment": " that's all" }, "_comment": " that's all" }, "learning_rate" :{ "type": "exp", "decay_steps": 50, "start_lr": 0.001, "stop_lr": 3.51e-8, "_comment": "that's all" }, "loss" :{ "type": "ener", "start_pref_e": 0.02, "limit_pref_e": 1, "start_pref_f": 1000, "limit_pref_f": 1, "start_pref_v": 0, "limit_pref_v": 0, "_comment": " that's all" }, "training" : { "training_data": { "systems": ["../00.data/training_data"], "batch_size": "auto", "_comment": "that's all" }, "validation_data":{ "systems": ["../00.data/validation_data"], "batch_size": "auto", "numb_btch": 1, "_comment": "that's all" }, "numb_steps": 10000, "seed": 10, "disp_file": "lcurve.out", "disp_freq": 200, "save_freq": 1000, "_comment": "that's all" }, "_comment": "that's all" }
DeePMD-kit requires a json
format file to specify parameters for training.
In the model section, the parameters of embedding and fitting networks are specified.
"model":{
"type_map": ["H", "C"],
"descriptor":{
"type": "se_e2_a",
"rcut": 6.00,
"rcut_smth": 0.50,
"sel": "auto",
"neuron": [25, 50, 100],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"_comment": "that's all"
},
"fitting_net":{
"neuron": [240, 240, 240],
"resnet_dt": true,
"seed": 1,
"_comment": "that's all"
},
"_comment": "that's all"'
},
The explanation for some of the parameters is as follows:
Parameter | Expiation |
---|---|
type_map | the name of each type of atom |
descriptor > type | the type of descriptor |
descriptor > rcut | cut-off radius |
descriptor > rcut_smth | where the smoothing starts |
descriptor > sel | the maximum number of type i atoms in the cut-off radius |
descriptor > neuron | size of the embedding neural network |
descriptor > axis_neuron | the size of the submatrix of G (embedding matrix) |
fitting_net > neuron | size of the fitting neural network |
The se_e2_a
descriptor is used to train the DP model. The item neurons set the size of the descriptors and fitting network to [25, 50, 100] and [240, 240, 240], respectively. The components in local environment to smoothly go to zero from 0.5 to 6 Å.
The following are the parameters that specify the learning rate and loss function.
"learning_rate" :{
"type": "exp",
"decay_steps": 50,
"start_lr": 0.001,
"stop_lr": 3.51e-8,
"_comment": "that's all"
},
"loss" :{
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": "that's all"
},
In the loss function, pref_e
increases from 0.02 to 1, and pref_f
decreases from 1000 to 1 progressively, which means that the force term dominates at the beginning, while energy and virial terms become important at the end. This strategy is very effective and reduces the total training time. pref_v
is set to 0 , indicating that no virial data are included in the training process. The starting learning rate, stop learning rate, and decay steps are set to 0.001, 3.51e-8, and 50, respectively. The model is trained for 10000 steps.
The training parameters are given in the following
"training" : {
"training_data": {
"systems": ["../00.data/training_data"],
"batch_size": "auto",
"_comment": "that's all"
},
"validation_data":{
"systems": ["../00.data/validation_data/"],
"batch_size": "auto",
"numb_btch": 1,
"_comment": "that's all"
},
"numb_steps": 10000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 200,
"save_freq": 10000,
},
More detailed docs about Data conversion can be found here
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) DEEPMD INFO Calculate neighbor statistics... (add --skip-neighbor-stat to skip this step) 2024-03-14 21:43:34.555220: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2024-03-14 21:43:34.555262: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-31 OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #157: KMP_AFFINITY: 32 available OS procs OMP: Info #158: KMP_AFFINITY: Uniform topology OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket". OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket". OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core". OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core". OMP: Info #192: KMP_AFFINITY: 1 socket x 16 cores/socket x 2 threads/core (16 total cores) OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 4 maps to socket 0 core 2 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 5 maps to socket 0 core 2 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 6 maps to socket 0 core 3 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 7 maps to socket 0 core 3 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 8 maps to socket 0 core 4 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 9 maps to socket 0 core 4 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 10 maps to socket 0 core 5 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 11 maps to socket 0 core 5 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 12 maps to socket 0 core 6 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 13 maps to socket 0 core 6 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 14 maps to socket 0 core 7 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 15 maps to socket 0 core 7 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 16 maps to socket 0 core 8 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 17 maps to socket 0 core 8 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 18 maps to socket 0 core 9 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 19 maps to socket 0 core 9 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 20 maps to socket 0 core 10 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 21 maps to socket 0 core 10 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 22 maps to socket 0 core 11 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 23 maps to socket 0 core 11 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 24 maps to socket 0 core 12 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 25 maps to socket 0 core 12 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 26 maps to socket 0 core 13 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 27 maps to socket 0 core 13 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 28 maps to socket 0 core 14 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 29 maps to socket 0 core 14 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 30 maps to socket 0 core 15 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 31 maps to socket 0 core 15 thread 1 OMP: Info #254: KMP_AFFINITY: pid 173 tid 243 thread 1 bound to OS proc set 2 OMP: Info #254: KMP_AFFINITY: pid 173 tid 245 thread 2 bound to OS proc set 4 OMP: Info #254: KMP_AFFINITY: pid 173 tid 247 thread 4 bound to OS proc set 8 OMP: Info #254: KMP_AFFINITY: pid 173 tid 246 thread 3 bound to OS proc set 6 OMP: Info #254: KMP_AFFINITY: pid 173 tid 248 thread 5 bound to OS proc set 10 OMP: Info #254: KMP_AFFINITY: pid 173 tid 249 thread 6 bound to OS proc set 12 OMP: Info #254: KMP_AFFINITY: pid 173 tid 250 thread 7 bound to OS proc set 14 OMP: Info #254: KMP_AFFINITY: pid 173 tid 251 thread 8 bound to OS proc set 16 OMP: Info #254: KMP_AFFINITY: pid 173 tid 252 thread 9 bound to OS proc set 18 OMP: Info #254: KMP_AFFINITY: pid 173 tid 253 thread 10 bound to OS proc set 20 OMP: Info #254: KMP_AFFINITY: pid 173 tid 254 thread 11 bound to OS proc set 22 OMP: Info #254: KMP_AFFINITY: pid 173 tid 256 thread 13 bound to OS proc set 26 OMP: Info #254: KMP_AFFINITY: pid 173 tid 255 thread 12 bound to OS proc set 24 OMP: Info #254: KMP_AFFINITY: pid 173 tid 257 thread 14 bound to OS proc set 28 OMP: Info #254: KMP_AFFINITY: pid 173 tid 258 thread 15 bound to OS proc set 30 OMP: Info #254: KMP_AFFINITY: pid 173 tid 259 thread 16 bound to OS proc set 1 OMP: Info #254: KMP_AFFINITY: pid 173 tid 260 thread 17 bound to OS proc set 3 OMP: Info #254: KMP_AFFINITY: pid 173 tid 261 thread 18 bound to OS proc set 5 OMP: Info #254: KMP_AFFINITY: pid 173 tid 262 thread 19 bound to OS proc set 7 OMP: Info #254: KMP_AFFINITY: pid 173 tid 263 thread 20 bound to OS proc set 9 OMP: Info #254: KMP_AFFINITY: pid 173 tid 264 thread 21 bound to OS proc set 11 OMP: Info #254: KMP_AFFINITY: pid 173 tid 265 thread 22 bound to OS proc set 13 OMP: Info #254: KMP_AFFINITY: pid 173 tid 266 thread 23 bound to OS proc set 15 OMP: Info #254: KMP_AFFINITY: pid 173 tid 268 thread 25 bound to OS proc set 19 OMP: Info #254: KMP_AFFINITY: pid 173 tid 267 thread 24 bound to OS proc set 17 OMP: Info #254: KMP_AFFINITY: pid 173 tid 269 thread 26 bound to OS proc set 21 OMP: Info #254: KMP_AFFINITY: pid 173 tid 270 thread 27 bound to OS proc set 23 OMP: Info #254: KMP_AFFINITY: pid 173 tid 272 thread 29 bound to OS proc set 27 OMP: Info #254: KMP_AFFINITY: pid 173 tid 271 thread 28 bound to OS proc set 25 OMP: Info #254: KMP_AFFINITY: pid 173 tid 273 thread 30 bound to OS proc set 29 OMP: Info #254: KMP_AFFINITY: pid 173 tid 274 thread 31 bound to OS proc set 31 OMP: Info #254: KMP_AFFINITY: pid 173 tid 275 thread 32 bound to OS proc set 0 OMP: Info #254: KMP_AFFINITY: pid 173 tid 242 thread 33 bound to OS proc set 2 OMP: Info #254: KMP_AFFINITY: pid 173 tid 276 thread 34 bound to OS proc set 4 OMP: Info #254: KMP_AFFINITY: pid 173 tid 277 thread 35 bound to OS proc set 6 OMP: Info #254: KMP_AFFINITY: pid 173 tid 280 thread 38 bound to OS proc set 12 OMP: Info #254: KMP_AFFINITY: pid 173 tid 281 thread 39 bound to OS proc set 14 OMP: Info #254: KMP_AFFINITY: pid 173 tid 278 thread 36 bound to OS proc set 8 OMP: Info #254: KMP_AFFINITY: pid 173 tid 279 thread 37 bound to OS proc set 10 OMP: Info #254: KMP_AFFINITY: pid 173 tid 282 thread 40 bound to OS proc set 16 OMP: Info #254: KMP_AFFINITY: pid 173 tid 283 thread 41 bound to OS proc set 18 OMP: Info #254: KMP_AFFINITY: pid 173 tid 284 thread 42 bound to OS proc set 20 OMP: Info #254: KMP_AFFINITY: pid 173 tid 285 thread 43 bound to OS proc set 22 OMP: Info #254: KMP_AFFINITY: pid 173 tid 286 thread 44 bound to OS proc set 24 OMP: Info #254: KMP_AFFINITY: pid 173 tid 287 thread 45 bound to OS proc set 26 OMP: Info #254: KMP_AFFINITY: pid 173 tid 288 thread 46 bound to OS proc set 28 OMP: Info #254: KMP_AFFINITY: pid 173 tid 289 thread 47 bound to OS proc set 30 OMP: Info #254: KMP_AFFINITY: pid 173 tid 290 thread 48 bound to OS proc set 1 OMP: Info #254: KMP_AFFINITY: pid 173 tid 291 thread 49 bound to OS proc set 3 OMP: Info #254: KMP_AFFINITY: pid 173 tid 292 thread 50 bound to OS proc set 5 OMP: Info #254: KMP_AFFINITY: pid 173 tid 293 thread 51 bound to OS proc set 7 OMP: Info #254: KMP_AFFINITY: pid 173 tid 294 thread 52 bound to OS proc set 9 OMP: Info #254: KMP_AFFINITY: pid 173 tid 295 thread 53 bound to OS proc set 11 OMP: Info #254: KMP_AFFINITY: pid 173 tid 296 thread 54 bound to OS proc set 13 OMP: Info #254: KMP_AFFINITY: pid 173 tid 297 thread 55 bound to OS proc set 15 OMP: Info #254: KMP_AFFINITY: pid 173 tid 298 thread 56 bound to OS proc set 17 OMP: Info #254: KMP_AFFINITY: pid 173 tid 299 thread 57 bound to OS proc set 19 OMP: Info #254: KMP_AFFINITY: pid 173 tid 300 thread 58 bound to OS proc set 21 OMP: Info #254: KMP_AFFINITY: pid 173 tid 301 thread 59 bound to OS proc set 23 OMP: Info #254: KMP_AFFINITY: pid 173 tid 302 thread 60 bound to OS proc set 25 OMP: Info #254: KMP_AFFINITY: pid 173 tid 303 thread 61 bound to OS proc set 27 OMP: Info #254: KMP_AFFINITY: pid 173 tid 304 thread 62 bound to OS proc set 29 OMP: Info #254: KMP_AFFINITY: pid 173 tid 305 thread 63 bound to OS proc set 31 OMP: Info #254: KMP_AFFINITY: pid 173 tid 306 thread 64 bound to OS proc set 0 DEEPMD INFO training data with min nbor dist: 1.045920568611028 DEEPMD INFO training data with max nbor size: [4 1] DEEPMD INFO _____ _____ __ __ _____ _ _ _ DEEPMD INFO | __ \ | __ \ | \/ || __ \ | | (_)| | DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |_ DEEPMD INFO | | | | / _ \ / _ \| ___/ | |\/| || | | ||______|| |/ /| || __| DEEPMD INFO | |__| || __/| __/| | | | | || |__| | | < | || |_ DEEPMD INFO |_____/ \___| \___||_| |_| |_||_____/ |_|\_\|_| \__| DEEPMD INFO Please read and cite: DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018) DEEPMD INFO installed to: /home/conda/feedstock_root/build_artifacts/deepmd-kit_1678943793317/work/_skbuild/linux-x86_64-3.10/cmake-install DEEPMD INFO source : v2.2.1 DEEPMD INFO source brach: HEAD DEEPMD INFO source commit: 3ac8c4c7 DEEPMD INFO source commit at: 2023-03-16 12:33:24 +0800 DEEPMD INFO build float prec: double DEEPMD INFO build variant: cuda DEEPMD INFO build with tf inc: /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/include;/opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/../../../../include DEEPMD INFO build with tf lib: DEEPMD INFO ---Summary of the training--------------------------------------- DEEPMD INFO running on: bohrium-11303-1108075 DEEPMD INFO computing device: cpu:0 DEEPMD INFO CUDA_VISIBLE_DEVICES: unset DEEPMD INFO Count of visible GPU: 0 DEEPMD INFO num_intra_threads: 0 DEEPMD INFO num_inter_threads: 0 DEEPMD INFO ----------------------------------------------------------------- DEEPMD INFO ---Summary of DataSystem: training ----------------------------------------------- DEEPMD INFO found 1 system(s): DEEPMD INFO system natoms bch_sz n_bch prob pbc DEEPMD INFO ../00.data/training_data 5 7 23 1.000 T DEEPMD INFO -------------------------------------------------------------------------------------- DEEPMD INFO ---Summary of DataSystem: validation ----------------------------------------------- DEEPMD INFO found 1 system(s): DEEPMD INFO system natoms bch_sz n_bch prob pbc DEEPMD INFO ../00.data/validation_data 5 7 5 1.000 T DEEPMD INFO -------------------------------------------------------------------------------------- DEEPMD INFO training without frame parameter DEEPMD INFO data stating... (this step may take long time) OMP: Info #254: KMP_AFFINITY: pid 173 tid 173 thread 0 bound to OS proc set 0 DEEPMD INFO built lr DEEPMD INFO built network DEEPMD INFO built training WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. DEEPMD INFO initialize model from scratch DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 50, decay_rate 0.950006, final lr will be 3.51e-08 DEEPMD INFO batch 200 training time 5.07 s, testing time 0.02 s DEEPMD INFO batch 400 training time 3.94 s, testing time 0.02 s DEEPMD INFO batch 600 training time 3.96 s, testing time 0.02 s DEEPMD INFO batch 800 training time 3.85 s, testing time 0.02 s DEEPMD INFO batch 1000 training time 3.96 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 1200 training time 4.08 s, testing time 0.02 s DEEPMD INFO batch 1400 training time 3.94 s, testing time 0.02 s DEEPMD INFO batch 1600 training time 3.93 s, testing time 0.02 s DEEPMD INFO batch 1800 training time 3.84 s, testing time 0.02 s DEEPMD INFO batch 2000 training time 3.91 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 2200 training time 3.96 s, testing time 0.02 s DEEPMD INFO batch 2400 training time 3.89 s, testing time 0.02 s DEEPMD INFO batch 2600 training time 3.96 s, testing time 0.02 s DEEPMD INFO batch 2800 training time 3.85 s, testing time 0.02 s DEEPMD INFO batch 3000 training time 3.97 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 3200 training time 3.90 s, testing time 0.02 s DEEPMD INFO batch 3400 training time 3.93 s, testing time 0.02 s DEEPMD INFO batch 3600 training time 3.90 s, testing time 0.02 s DEEPMD INFO batch 3800 training time 3.87 s, testing time 0.02 s DEEPMD INFO batch 4000 training time 3.99 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 4200 training time 3.90 s, testing time 0.02 s DEEPMD INFO batch 4400 training time 3.87 s, testing time 0.02 s DEEPMD INFO batch 4600 training time 3.96 s, testing time 0.02 s DEEPMD INFO batch 4800 training time 3.89 s, testing time 0.02 s DEEPMD INFO batch 5000 training time 3.92 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 5200 training time 3.87 s, testing time 0.02 s DEEPMD INFO batch 5400 training time 3.91 s, testing time 0.02 s DEEPMD INFO batch 5600 training time 3.94 s, testing time 0.02 s DEEPMD INFO batch 5800 training time 3.93 s, testing time 0.02 s DEEPMD INFO batch 6000 training time 3.93 s, testing time 0.02 s WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/training/saver.py:1066: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to delete files with this prefix. WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/training/saver.py:1066: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to delete files with this prefix. DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 6200 training time 3.93 s, testing time 0.02 s DEEPMD INFO batch 6400 training time 3.86 s, testing time 0.02 s DEEPMD INFO batch 6600 training time 3.91 s, testing time 0.02 s DEEPMD INFO batch 6800 training time 3.93 s, testing time 0.02 s DEEPMD INFO batch 7000 training time 4.04 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 7200 training time 3.94 s, testing time 0.02 s DEEPMD INFO batch 7400 training time 3.98 s, testing time 0.02 s DEEPMD INFO batch 7600 training time 3.91 s, testing time 0.02 s DEEPMD INFO batch 7800 training time 3.95 s, testing time 0.02 s DEEPMD INFO batch 8000 training time 3.88 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 8200 training time 3.87 s, testing time 0.02 s DEEPMD INFO batch 8400 training time 3.96 s, testing time 0.02 s DEEPMD INFO batch 8600 training time 3.97 s, testing time 0.02 s DEEPMD INFO batch 8800 training time 4.01 s, testing time 0.02 s DEEPMD INFO batch 9000 training time 3.97 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 9200 training time 3.94 s, testing time 0.02 s DEEPMD INFO batch 9400 training time 3.89 s, testing time 0.02 s DEEPMD INFO batch 9600 training time 3.87 s, testing time 0.02 s DEEPMD INFO batch 9800 training time 4.11 s, testing time 0.02 s DEEPMD INFO batch 10000 training time 3.92 s, testing time 0.02 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO average training time: 0.0196 s/batch (exclude first 200 batches) DEEPMD INFO finished training DEEPMD INFO wall time: 209.398 s
On the screen, you will see the information of the data system(s)
DEEPMD INFO -----------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: training ----------------------------------
DEEPMD INFO found 1 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO ../00.data/training_data 5 7 23 1.000 T
DEEPMD INFO -------------------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: validation ----------------------------------
DEEPMD INFO found 1 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO ../00.data/validation_data 5 7 5 1.000 T
DEEPMD INFO -------------------------------------------------------------------------
and the starting and final learning rate of this training
DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 50, decay_rate 0.950006, final lr will be 3.51e-08
If everything works fine, you will see, on the screen, information printed every 1000 steps, like
DEEPMD INFO batch 200 training time 6.04 s, testing time 0.02 s
DEEPMD INFO batch 400 training time 4.80 s, testing time 0.02 s
DEEPMD INFO batch 600 training time 4.80 s, testing time 0.02 s
DEEPMD INFO batch 800 training time 4.78 s, testing time 0.02 s
DEEPMD INFO batch 1000 training time 4.77 s, testing time 0.02 s
DEEPMD INFO saved checkpoint model.ckpt
DEEPMD INFO batch 1200 training time 4.47 s, testing time 0.02 s
DEEPMD INFO batch 1400 training time 4.49 s, testing time 0.02 s
DEEPMD INFO batch 1600 training time 4.45 s, testing time 0.02 s
DEEPMD INFO batch 1800 training time 4.44 s, testing time 0.02 s
DEEPMD INFO batch 2000 training time 4.46 s, testing time 0.02 s
DEEPMD INFO saved checkpoint model.ckpt
They present the training and testing time counts. At the end of the 1000th batch, the model is saved in Tensorflow's checkpoint file model.ckpt
. At the same time, the training and testing errors are presented in file lcurve.out
.
The file contains 8 columns, form left to right, are the training step, the validation loss, training loss, root mean square (RMS) validation error of energy, RMS training error of energy, RMS validation error of force, RMS training error of force and the learning rate. The RMS error (RMSE) of the energy is normalized by number of atoms in the system.
head -n 2 lcurve.out
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
0 2.02e+01 1.51e+01 1.37e-01 1.41e-01 6.40e-01 4.79e-01 1.0e-03
and
$ tail -n 2 lcurve.out
9800 2.45e-02 4.02e-02 3.20e-04 3.88e-04 2.40e-02 3.94e-02 4.3e-08
10000 4.60e-02 3.76e-02 8.65e-04 5.35e-04 4.52e-02 3.69e-02 3.5e-08
Volumes 4, 5 and 6, 7 present energy and force training and testing errors, respectively.
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr # If there is no available reference data, rmse_*_{val,trn} will print nan 9800 2.40e-02 2.82e-02 2.27e-04 3.77e-04 2.35e-02 2.76e-02 4.3e-08 10000 3.82e-02 3.02e-02 6.48e-04 3.29e-04 3.75e-02 2.97e-02 3.5e-08
The loss function can be visualized to monitor the training process.
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) 2024-03-14 21:47:13.541845: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2024-03-14 21:47:13.541894: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) DEEPMD INFO The following nodes will be frozen: ['model_type', 'descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'train_attr/min_nbor_dist', 'train_attr/training_script', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam'] WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.convert_variables_to_constants` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.convert_variables_to_constants` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.extract_sub_graph` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.extract_sub_graph` DEEPMD INFO 1222 ops in the final graph.
and it will output a model file named graph.pb
in the current directory.
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) 2024-03-14 21:47:17.858952: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2024-03-14 21:47:17.858998: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. DEEPMD WARNING You can use the environment variable DP_INFER_BATCH_SIZE tocontrol the inference batch size (nframes * natoms). The default value is 1024. DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : ../00.data/validation_data OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-31 OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #157: KMP_AFFINITY: 32 available OS procs OMP: Info #158: KMP_AFFINITY: Uniform topology OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket". OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket". OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core". OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core". OMP: Info #192: KMP_AFFINITY: 1 socket x 16 cores/socket x 2 threads/core (16 total cores) OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 4 maps to socket 0 core 2 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 5 maps to socket 0 core 2 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 6 maps to socket 0 core 3 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 7 maps to socket 0 core 3 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 8 maps to socket 0 core 4 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 9 maps to socket 0 core 4 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 10 maps to socket 0 core 5 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 11 maps to socket 0 core 5 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 12 maps to socket 0 core 6 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 13 maps to socket 0 core 6 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 14 maps to socket 0 core 7 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 15 maps to socket 0 core 7 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 16 maps to socket 0 core 8 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 17 maps to socket 0 core 8 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 18 maps to socket 0 core 9 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 19 maps to socket 0 core 9 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 20 maps to socket 0 core 10 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 21 maps to socket 0 core 10 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 22 maps to socket 0 core 11 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 23 maps to socket 0 core 11 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 24 maps to socket 0 core 12 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 25 maps to socket 0 core 12 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 26 maps to socket 0 core 13 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 27 maps to socket 0 core 13 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 28 maps to socket 0 core 14 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 29 maps to socket 0 core 14 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 30 maps to socket 0 core 15 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 31 maps to socket 0 core 15 thread 1 OMP: Info #254: KMP_AFFINITY: pid 506 tid 540 thread 1 bound to OS proc set 2 OMP: Info #254: KMP_AFFINITY: pid 506 tid 543 thread 2 bound to OS proc set 4 OMP: Info #254: KMP_AFFINITY: pid 506 tid 544 thread 3 bound to OS proc set 6 OMP: Info #254: KMP_AFFINITY: pid 506 tid 545 thread 4 bound to OS proc set 8 OMP: Info #254: KMP_AFFINITY: pid 506 tid 546 thread 5 bound to OS proc set 10 OMP: Info #254: KMP_AFFINITY: pid 506 tid 547 thread 6 bound to OS proc set 12 OMP: Info #254: KMP_AFFINITY: pid 506 tid 548 thread 7 bound to OS proc set 14 OMP: Info #254: KMP_AFFINITY: pid 506 tid 549 thread 8 bound to OS proc set 16 OMP: Info #254: KMP_AFFINITY: pid 506 tid 550 thread 9 bound to OS proc set 18 OMP: Info #254: KMP_AFFINITY: pid 506 tid 551 thread 10 bound to OS proc set 20 OMP: Info #254: KMP_AFFINITY: pid 506 tid 552 thread 11 bound to OS proc set 22 OMP: Info #254: KMP_AFFINITY: pid 506 tid 553 thread 12 bound to OS proc set 24 OMP: Info #254: KMP_AFFINITY: pid 506 tid 554 thread 13 bound to OS proc set 26 OMP: Info #254: KMP_AFFINITY: pid 506 tid 555 thread 14 bound to OS proc set 28 OMP: Info #254: KMP_AFFINITY: pid 506 tid 556 thread 15 bound to OS proc set 30 OMP: Info #254: KMP_AFFINITY: pid 506 tid 557 thread 16 bound to OS proc set 1 OMP: Info #254: KMP_AFFINITY: pid 506 tid 558 thread 17 bound to OS proc set 3 OMP: Info #254: KMP_AFFINITY: pid 506 tid 559 thread 18 bound to OS proc set 5 OMP: Info #254: KMP_AFFINITY: pid 506 tid 560 thread 19 bound to OS proc set 7 OMP: Info #254: KMP_AFFINITY: pid 506 tid 561 thread 20 bound to OS proc set 9 OMP: Info #254: KMP_AFFINITY: pid 506 tid 562 thread 21 bound to OS proc set 11 OMP: Info #254: KMP_AFFINITY: pid 506 tid 563 thread 22 bound to OS proc set 13 OMP: Info #254: KMP_AFFINITY: pid 506 tid 564 thread 23 bound to OS proc set 15 OMP: Info #254: KMP_AFFINITY: pid 506 tid 565 thread 24 bound to OS proc set 17 OMP: Info #254: KMP_AFFINITY: pid 506 tid 566 thread 25 bound to OS proc set 19 OMP: Info #254: KMP_AFFINITY: pid 506 tid 567 thread 26 bound to OS proc set 21 OMP: Info #254: KMP_AFFINITY: pid 506 tid 568 thread 27 bound to OS proc set 23 OMP: Info #254: KMP_AFFINITY: pid 506 tid 569 thread 28 bound to OS proc set 25 OMP: Info #254: KMP_AFFINITY: pid 506 tid 570 thread 29 bound to OS proc set 27 OMP: Info #254: KMP_AFFINITY: pid 506 tid 571 thread 30 bound to OS proc set 29 OMP: Info #254: KMP_AFFINITY: pid 506 tid 572 thread 31 bound to OS proc set 31 OMP: Info #254: KMP_AFFINITY: pid 506 tid 573 thread 32 bound to OS proc set 0 OMP: Info #254: KMP_AFFINITY: pid 506 tid 541 thread 33 bound to OS proc set 2 OMP: Info #254: KMP_AFFINITY: pid 506 tid 574 thread 34 bound to OS proc set 4 OMP: Info #254: KMP_AFFINITY: pid 506 tid 575 thread 35 bound to OS proc set 6 OMP: Info #254: KMP_AFFINITY: pid 506 tid 576 thread 36 bound to OS proc set 8 OMP: Info #254: KMP_AFFINITY: pid 506 tid 577 thread 37 bound to OS proc set 10 OMP: Info #254: KMP_AFFINITY: pid 506 tid 578 thread 38 bound to OS proc set 12 OMP: Info #254: KMP_AFFINITY: pid 506 tid 579 thread 39 bound to OS proc set 14 OMP: Info #254: KMP_AFFINITY: pid 506 tid 580 thread 40 bound to OS proc set 16 OMP: Info #254: KMP_AFFINITY: pid 506 tid 581 thread 41 bound to OS proc set 18 OMP: Info #254: KMP_AFFINITY: pid 506 tid 582 thread 42 bound to OS proc set 20 OMP: Info #254: KMP_AFFINITY: pid 506 tid 583 thread 43 bound to OS proc set 22 OMP: Info #254: KMP_AFFINITY: pid 506 tid 584 thread 44 bound to OS proc set 24 OMP: Info #254: KMP_AFFINITY: pid 506 tid 585 thread 45 bound to OS proc set 26 OMP: Info #254: KMP_AFFINITY: pid 506 tid 586 thread 46 bound to OS proc set 28 OMP: Info #254: KMP_AFFINITY: pid 506 tid 587 thread 47 bound to OS proc set 30 OMP: Info #254: KMP_AFFINITY: pid 506 tid 588 thread 48 bound to OS proc set 1 OMP: Info #254: KMP_AFFINITY: pid 506 tid 590 thread 50 bound to OS proc set 5 OMP: Info #254: KMP_AFFINITY: pid 506 tid 589 thread 49 bound to OS proc set 3 OMP: Info #254: KMP_AFFINITY: pid 506 tid 591 thread 51 bound to OS proc set 7 OMP: Info #254: KMP_AFFINITY: pid 506 tid 592 thread 52 bound to OS proc set 9 OMP: Info #254: KMP_AFFINITY: pid 506 tid 593 thread 53 bound to OS proc set 11 OMP: Info #254: KMP_AFFINITY: pid 506 tid 595 thread 55 bound to OS proc set 15 OMP: Info #254: KMP_AFFINITY: pid 506 tid 594 thread 54 bound to OS proc set 13 OMP: Info #254: KMP_AFFINITY: pid 506 tid 596 thread 56 bound to OS proc set 17 OMP: Info #254: KMP_AFFINITY: pid 506 tid 597 thread 57 bound to OS proc set 19 OMP: Info #254: KMP_AFFINITY: pid 506 tid 598 thread 58 bound to OS proc set 21 OMP: Info #254: KMP_AFFINITY: pid 506 tid 599 thread 59 bound to OS proc set 23 OMP: Info #254: KMP_AFFINITY: pid 506 tid 600 thread 60 bound to OS proc set 25 OMP: Info #254: KMP_AFFINITY: pid 506 tid 601 thread 61 bound to OS proc set 27 OMP: Info #254: KMP_AFFINITY: pid 506 tid 602 thread 62 bound to OS proc set 29 OMP: Info #254: KMP_AFFINITY: pid 506 tid 603 thread 63 bound to OS proc set 31 OMP: Info #254: KMP_AFFINITY: pid 506 tid 604 thread 64 bound to OS proc set 0 DEEPMD INFO # number of test data : 40 DEEPMD INFO Energy MAE : 1.761527e-03 eV DEEPMD INFO Energy RMSE : 2.288492e-03 eV DEEPMD INFO Energy MAE/Natoms : 3.523054e-04 eV DEEPMD INFO Energy RMSE/Natoms : 4.576985e-04 eV DEEPMD INFO Force MAE : 2.518540e-02 eV/A DEEPMD INFO Force RMSE : 3.299285e-02 eV/A DEEPMD INFO Virial MAE : 3.007277e-02 eV DEEPMD INFO Virial RMSE : 4.115609e-02 eV DEEPMD INFO Virial MAE/Natoms : 6.014553e-03 eV DEEPMD INFO Virial RMSE/Natoms : 8.231219e-03 eV DEEPMD INFO # -----------------------------------------------
The correlation between predicted data and original data can also be calculated.
2024-03-14 21:47:19.803058: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-14 21:47:22.105775: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2024-03-14 21:47:22.106537: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2024-03-14 21:47:22.106561: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. WARNING:tensorflow:From /opt/mamba/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:tensorflow:From /opt/mamba/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. 2024-03-14 21:47:23.928892: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-14 21:47:23.933442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2024-03-14 21:47:23.933479: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303) 2024-03-14 21:47:23.933497: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (bohrium-11303-1108075): /proc/driver/nvidia/version does not exist 2024-03-14 21:47:23.950512: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled WARNING:tensorflow:From /opt/mamba/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. WARNING:deepmd.utils.batch_size:You can use the environment variable DP_INFER_BATCH_SIZE tocontrol the inference batch size (nframes * natoms). The default value is 1024.
[]
7 Run MD with LAMMPS
The model can drive molecular dynamics in LAMMPS.
DeePMD-kit_Tutorial . ├── ch4.dump ├── conf.lmp ├── graph.pb └── in.lammps 0 directories, 4 files
Here conf.lmp
gives the initial configuration of a gas phase methane MD simulation, and the file in.lammps
is the lammps input script. One may check in.lammps and finds that it is a rather standard LAMMPS input file for a MD simulation, with only two exception lines:
pair_style deepmd graph.pb
pair_coeff * *
where the pair style deepmd is invoked and the model file graph.pb
is provided, which means the atomic interaction will be computed by the DP model that is stored in the file graph.pb
.
In an environment with a compatibable version of LAMMPS, the deep potential molecular dynamics can be performed via
lmp -i input.lammps
Warning: This LAMMPS executable is in a conda environment, but the environment has not been activated. Libraries may fail to load. To activate this environment please see https://conda.io/activation. LAMMPS (23 Jun 2022 - Update 1) OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98) using 1 OpenMP thread(s) per MPI task Loaded 1 plugins from /opt/deepmd-kit-2.2.1/lib/deepmd_lmp Reading data file ... triclinic box = (0 0 0) to (10.114259 10.263124 10.216793) with tilt (0.036749877 0.13833062 -0.056322169) 1 by 1 by 1 MPI processor grid reading atoms ... 5 atoms read_data CPU = 0.003 seconds DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. Summary of lammps deepmd module ... >>> Info of deepmd-kit: installed to: /opt/deepmd-kit-2.2.1 source: v2.2.1 source branch: HEAD source commit: 3ac8c4c7 source commit at: 2023-03-16 12:33:24 +0800 surpport model ver.:1.1 build variant: cuda build with tf inc: /opt/deepmd-kit-2.2.1/include;/opt/deepmd-kit-2.2.1/include build with tf lib: /opt/deepmd-kit-2.2.1/lib/libtensorflow_cc.so set tf intra_op_parallelism_threads: 0 set tf inter_op_parallelism_threads: 0 >>> Info of lammps module: use deepmd-kit at: /opt/deepmd-kit-2.2.1DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit: Successfully load libcudart.so 2024-03-14 21:47:29.297659: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-14 21:47:29.305717: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2024-03-14 21:47:29.305751: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2024-03-14 21:47:29.305778: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (bohrium-11303-1108075): /proc/driver/nvidia/version does not exist 2024-03-14 21:47:29.307023: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. 2024-03-14 21:47:29.360272: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled >>> Info of model(s): using 1 model(s): graph.pb rcut in model: 6 ntypes in model: 2 CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - USER-DEEPMD package: The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Neighbor list info ... update every 10 steps, delay 0 steps, check no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 7 ghost atom cutoff = 7 binsize = 3.5, bins = 3 3 3 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair deepmd, perpetual attributes: full, newton on pair build: full/bin/atomonly stencil: full/bin/3d bin: standard Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.001 Per MPI rank memory allocation (min/avg/max) = 3.809 | 3.809 | 3.809 Mbytes Step PotEng KinEng TotEng Temp Press Volume 0 -219.77406 0.025852029 -219.74821 50 -779.25188 1060.5429 100 -219.7691 0.020797437 -219.7483 40.223994 -637.14305 1060.5429 200 -219.77444 0.024939285 -219.7495 48.234676 -320.40098 1060.5429 300 -219.78439 0.033072979 -219.75132 63.965925 43.026442 1060.5429 400 -219.78739 0.034550668 -219.75284 66.823899 351.67696 1060.5429 500 -219.78236 0.028176993 -219.75419 54.496675 666.75737 1060.5429 600 -219.78253 0.025728877 -219.7568 49.761815 711.163 1060.5429 700 -219.78894 0.028382389 -219.76055 54.893929 479.13643 1060.5429 800 -219.78903 0.024859643 -219.76417 48.080642 83.077656 1060.5429 900 -219.78291 0.015448216 -219.76746 29.878151 -300.47299 1060.5429 1000 -219.78076 0.009958727 -219.7708 19.261016 -547.51181 1060.5429 1100 -219.78482 0.010708236 -219.77412 20.710629 -528.61764 1060.5429 1200 -219.79088 0.014010222 -219.77687 27.096949 -271.00462 1060.5429 1300 -219.79342 0.014851244 -219.77856 28.723555 113.17225 1060.5429 1400 -219.79258 0.013154759 -219.77943 25.442411 429.376 1060.5429 1500 -219.79351 0.013439034 -219.78007 25.992223 502.07008 1060.5429 1600 -219.79556 0.015244258 -219.78032 29.483679 283.58943 1060.5429 1700 -219.79243 0.012780435 -219.77965 24.718437 -118.65863 1060.5429 1800 -219.78753 0.0093475149 -219.77818 18.078881 -440.59299 1060.5429 1900 -219.78644 0.010485894 -219.77595 20.280601 -548.35192 1060.5429 2000 -219.78657 0.014291536 -219.77228 27.641033 -379.53425 1060.5429 2100 -219.78582 0.019271324 -219.76655 37.272363 3.8622352 1060.5429 2200 -219.78342 0.023480599 -219.75994 45.413455 421.70988 1060.5429 2300 -219.7843 0.029382788 -219.75492 56.828785 667.59953 1060.5429 2400 -219.78777 0.035686462 -219.75209 69.020621 708.56999 1060.5429 2500 -219.78484 0.034331665 -219.75051 66.400331 560.05263 1060.5429 2600 -219.781 0.031551297 -219.74945 61.022863 272.92453 1060.5429 2700 -219.77767 0.028694937 -219.74898 55.498424 -150.97111 1060.5429 2800 -219.77576 0.026724406 -219.74903 51.68725 -531.59493 1060.5429 2900 -219.77353 0.02427308 -219.74926 46.946179 -733.24223 1060.5429 3000 -219.77341 0.023395232 -219.75001 45.248349 -706.66882 1060.5429 3100 -219.77969 0.028192211 -219.7515 54.526108 -543.66237 1060.5429 3200 -219.78698 0.033417316 -219.75356 64.631901 -242.21772 1060.5429 3300 -219.79019 0.034565479 -219.75562 66.852545 120.70082 1060.5429 3400 -219.78417 0.026911374 -219.75726 52.048862 557.3674 1060.5429 3500 -219.77449 0.01580688 -219.75868 30.571837 766.81564 1060.5429 3600 -219.77669 0.015991276 -219.7607 30.928474 683.18009 1060.5429 3700 -219.78572 0.022618548 -219.7631 43.746176 290.23355 1060.5429 3800 -219.79279 0.027795739 -219.76499 53.759299 -181.24253 1060.5429 3900 -219.78826 0.022283131 -219.76598 43.097451 -515.8167 1060.5429 4000 -219.78236 0.015546589 -219.76681 30.068411 -608.47483 1060.5429 4100 -219.7863 0.018324324 -219.76797 35.440785 -500.82916 1060.5429 4200 -219.79138 0.022304319 -219.76908 43.138431 -200.46212 1060.5429 4300 -219.78855 0.019344221 -219.7692 37.413352 235.01595 1060.5429 4400 -219.78187 0.013421854 -219.76845 25.958995 569.04151 1060.5429 4500 -219.77866 0.011098531 -219.76756 21.465493 679.98587 1060.5429 4600 -219.78476 0.018276314 -219.76649 35.347929 462.8788 1060.5429 4700 -219.79061 0.026535396 -219.76407 51.321689 -44.12474 1060.5429 4800 -219.78698 0.027259367 -219.75972 52.721911 -476.42292 1060.5429 4900 -219.77955 0.025502005 -219.75405 49.323023 -698.38931 1060.5429 5000 -219.77457 0.026622642 -219.74795 51.49043 -707.85111 1060.5429 Loop time of 12.1621 on 1 procs for 5000 steps with 5 atoms Performance: 35.520 ns/day, 0.676 hours/ns, 411.113 timesteps/s 213.7% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 12.128 | 12.128 | 12.128 | 0.0 | 99.72 Neigh | 0.0051578 | 0.0051578 | 0.0051578 | 0.0 | 0.04 Comm | 0.00889 | 0.00889 | 0.00889 | 0.0 | 0.07 Output | 0.0043063 | 0.0043063 | 0.0043063 | 0.0 | 0.04 Modify | 0.011562 | 0.011562 | 0.011562 | 0.0 | 0.10 Other | | 0.00382 | | | 0.03 Nlocal: 5 ave 5 max 5 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 130 ave 130 max 130 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 20 ave 20 max 20 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 20 Ave neighs/atom = 4 Neighbor list builds = 500 Dangerous builds not checked Total wall time: 0:00:13
bohrb21314