【DeePMD-kit v3教程1】多后端框架·使用教程

空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的知识库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

【DeePMD-kit v3教程1】多后端框架·使用教程

DeePMD-kit

中文

DeePMD-kit中文

jinzhe.zeng@rutgers.edu

更新于 2024-11-24

推荐镜像 :Basic Image:ubuntu22.04-py3.10

推荐机型 :c12_m92_1 * NVIDIA V100

安装

多后端训练/冻结/压缩

模型转换

模型测试

LAMMPS动力学模拟测速

安装

代码

文本

[1]

!nvidia-smi

Sun Nov 24 13:02:01 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:09.0 Off |                    0 |
| N/A   33C    P0    36W / 300W |      0MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

代码

文本

这里我们用一个V100节点进行演示，并安装GPU环境的DeePMD-kit。

代码

文本

[2]

!wget https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0/deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0 -O deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0

!wget https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0/deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1 -O deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1

!cat deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0 deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1 > deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh

--2024-11-24 13:02:28--  https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0/deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.18, 10.255.254.7, 10.255.254.37
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/114006193/4858a092-c590-4b8f-a9d9-a3ee36f1e2eb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241124T050228Z&X-Amz-Expires=300&X-Amz-Signature=a5676bb7269222f4c3e1ac0f25ac57d4a7d74e8846a3eeda41f5b2b7777f537b&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Ddeepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0&response-content-type=application%2Foctet-stream [following]
--2024-11-24 13:02:28--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/114006193/4858a092-c590-4b8f-a9d9-a3ee36f1e2eb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241124T050228Z&X-Amz-Expires=300&X-Amz-Signature=a5676bb7269222f4c3e1ac0f25ac57d4a7d74e8846a3eeda41f5b2b7777f537b&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Ddeepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0&response-content-type=application%2Foctet-stream
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 1593874464 (1.5G) [application/octet-stream]
Saving to: ‘deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0’

deepmd-kit-3.0.0-cu 100%[===================>]   1.48G  3.50MB/s    in 6m 38s  

2024-11-24 13:09:08 (3.82 MB/s) - ‘deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0’ saved [1593874464/1593874464]

--2024-11-24 13:09:08--  https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0/deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.7, 10.255.254.18
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/114006193/ab02be0e-f4b9-44db-a593-bfcc82fb194b?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241124T050909Z&X-Amz-Expires=300&X-Amz-Signature=d1b36bf592d4fa6719d18177f8136d42dd05f4a454fc730ce9ed337c756bedad&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Ddeepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1&response-content-type=application%2Foctet-stream [following]
--2024-11-24 13:09:09--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/114006193/ab02be0e-f4b9-44db-a593-bfcc82fb194b?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241124T050909Z&X-Amz-Expires=300&X-Amz-Signature=d1b36bf592d4fa6719d18177f8136d42dd05f4a454fc730ce9ed337c756bedad&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Ddeepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1&response-content-type=application%2Foctet-stream
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 1593874465 (1.5G) [application/octet-stream]
Saving to: ‘deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1’

deepmd-kit-3.0.0-cu 100%[===================>]   1.48G  5.36MB/s    in 5m 53s  

2024-11-24 13:15:04 (4.30 MB/s) - ‘deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1’ saved [1593874465/1593874465]

代码

文本

[3]

!sh deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh -b

PREFIX=/root/deepmd-kit
Unpacking payload ...
Notes:
The off-line packages and conda packages require the GNU C Library 2.17 or above[1]. The GPU version requires compatible NVIDIA driver to be installed in advance[2]. It is possible to force conda to override detection when installation[3] (such as CONDA_OVERRIDE_CUDA), but these requirements are still necessary during runtime.

[1] The GNU C Library. https://www.gnu.org/software/libc/
[2] Minor Version Compatibility. NVIDIA Data Center GPU Driver Documentation. https://docs.nvidia.com/deploy/cuda-compatibility/index.html#minor-version-compatibility
[3] Overriding detected packages. conda documentation. https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html#overriding-detected-packages

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working... By downloading and using the cuDNN conda packages, you accept the terms and conditions of the NVIDIA cuDNN EULA -
https://docs.nvidia.com/deeplearning/cudnn/sla/index.html

To enable CUDA support, UCX requires the CUDA Runtime library (libcudart).
The library can be installed with the appropriate command below:

* For CUDA 11, run: conda install cudatoolkit cuda-version=11
* For CUDA 12, run: conda install cuda-cudart cuda-version=12

To enable CUDA support, please follow UCX's instruction above.

To additionally enable NCCL support, run: conda install nccl

On Linux, Open MPI is built with CUDA awareness but it is disabled by default.
To enable it, please set the environment variable
OMPI_MCA_opal_cuda_support=true
before launching your MPI processes.
Equivalently, you can set the MCA parameter in the command line:
mpiexec --mca opal_cuda_support 1 ...
Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via
UCX. Please consult UCX documentation for further details.

done
Please activate the environment before using the packages:

source /path/to/deepmd-kit/bin/activate /path/to/deepmd-kit

This package enables TensorFlow, PyTorch, and JAX backends.

The following executable files have been installed:
1. DeePMD-kit CLi: dp -h
2. LAMMPS: lmp -h
3. DeePMD-kit i-Pi interface: dp_ipi
4. MPICH: mpirun -h
5. Horovod: horovod -h

The following Python libraries have been installed:
1. deepmd
2. dpdata
3. pylammps

If you have any questions, seek help from https://github.com/deepmodeling/deepmd-kit/discussions

installation finished.

代码

文本

接下来的代码将使用

%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit

作为开头激活环境。

代码

文本

这里修复libdevice not found at ./libdevice.10.bc的报错。

代码

文本

[10]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

pip install nvidia-cuda-nvcc-cu12

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting nvidia-cuda-nvcc-cu12
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/25/1f/faf9b791027ebd6354be68700da3c3d8a3b3db3bdcf2f8070f2e6871a7f1/nvidia_cuda_nvcc_cu12-12.6.85-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (21.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.2/21.2 MB 29.0 MB/s eta 0:00:00a 0:00:01
Installing collected packages: nvidia-cuda-nvcc-cu12
Successfully installed nvidia-cuda-nvcc-cu12-12.6.85
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

代码

文本

[4]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp -h

usage: dp [-h]
          [-b {tensorflow,tf,jax,pytorch,pt} | --tensorflow | --jax | --pytorch]
          [--version]
          {transfer,train,freeze,test,compress,doc-train-input,model-devi,convert-from,neighbor-stat,change-bias,train-nvnmd,gui,convert-backend,show}
          ...

DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics

options:
  -h, --help            show this help message and exit
  -b {tensorflow,tf,jax,pytorch,pt}, --backend {tensorflow,tf,jax,pytorch,pt}
                        The backend of the model. Default can be set by environment variable DP_BACKEND. (default: tensorflow)
  --tensorflow, --tf    Alias for --backend tensorflow (default: None)
  --jax                 Alias for --backend jax (default: None)
  --pytorch, --pt       Alias for --backend pytorch (default: None)
  --version             show program's version number and exit

Valid subcommands:
  {transfer,train,freeze,test,compress,doc-train-input,model-devi,convert-from,neighbor-stat,change-bias,train-nvnmd,gui,convert-backend,show}
    transfer            (Supported backend: TensorFlow) pass parameters to another model
    train               train a model
    freeze              freeze the model

    test                test the model
    compress            Compress a model
    doc-train-input     print the documentation (in rst format) of input training parameters.
    model-devi          calculate model deviation
    convert-from        (Supported backend: TensorFlow) convert lower model version to supported version
    neighbor-stat       Calculate neighbor statistics
    change-bias         (Supported backend: PyTorch) Change model out bias according to the input data.
    train-nvnmd         (Supported backend: TensorFlow) train nvnmd model
    gui                 Serve DP-GUI.
    convert-backend     Convert model to another backend.
    show                Show the information of a model

Use --tf or --pt to choose the backend:
    dp --tf train input.json
    dp --pt train input.json

代码

文本

多后端训练/冻结/压缩

代码

文本

我们使用DeePMD-kit的se_atten_compressible例子作为示范，并将训练步数改为1000。

代码

文本

[5]

!git clone https://github.com/deepmodeling/deepmd-kit

Cloning into 'deepmd-kit'...
remote: Enumerating objects: 36456, done.
remote: Counting objects: 100% (1438/1438), done.
remote: Compressing objects: 100% (1050/1050), done.
remote: Total 36456 (delta 757), reused 811 (delta 384), pack-reused 35018 (from 1)
Receiving objects: 100% (36456/36456), 63.99 MiB | 5.33 MiB/s, done.
Resolving deltas: 100% (27045/27045), done.

代码

文本

[6]

%cd deepmd-kit/examples/water/se_atten_compressible

/deepmd-kit/examples/water/se_atten_compressible

代码

文本

[7]

!sed -i "s/1000000/1000/g" input.json

代码

文本

TensorFlow

代码

文本

[11]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp --tf train input.json

2024-11-24 13:37:00.784789: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 13:37:00.802137: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 13:37:00.807370: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-24 13:37:00.819907: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[bohrium-156-1225901:00353] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/1785462784/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:00353] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
[2024-11-24 13:37:09,606] DEEPMD INFO    Calculate neighbor statistics... (add --skip-neighbor-stat to skip this step)
[2024-11-24 13:37:09,680] DEEPMD INFO    If you encounter the error 'an illegal memory access was encountered', this may be due to a TensorFlow issue. To avoid this, set the environment variable DP_INFER_BATCH_SIZE to a smaller value than the last adjusted batch size. The environment variable DP_INFER_BATCH_SIZE controls the inference batch size (nframes * natoms). 
2024-11-24 13:37:10.921907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:37:10.928730: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-11-24 13:37:11.005802: I tensorflow/core/util/cuda_solvers.cc:178] Creating GpuSolver handles for stream 0x55d7b60649a0
[2024-11-24 13:37:13,446] DEEPMD INFO    Adjust batch size from 1024 to 2048
[2024-11-24 13:37:13,492] DEEPMD INFO    Adjust batch size from 2048 to 4096
[2024-11-24 13:37:13,599] DEEPMD INFO    Adjust batch size from 4096 to 8192
[2024-11-24 13:37:13,791] DEEPMD INFO    Adjust batch size from 8192 to 16384
[2024-11-24 13:37:14,372] DEEPMD INFO    training data with min nbor dist: 0.8854385688525499
[2024-11-24 13:37:14,373] DEEPMD INFO    training data with max nbor size: [108]
[2024-11-24 13:37:14,409] DEEPMD INFO     _____               _____   __  __  _____           _     _  _   
[2024-11-24 13:37:14,409] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |  
[2024-11-24 13:37:14,409] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_ 
[2024-11-24 13:37:14,409] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2024-11-24 13:37:14,409] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_ 
[2024-11-24 13:37:14,409] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2024-11-24 13:37:14,409] DEEPMD INFO    Please read and cite:
[2024-11-24 13:37:14,409] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2024-11-24 13:37:14,409] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2024-11-24 13:37:14,409] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2024-11-24 13:37:14,409] DEEPMD INFO    ----------------------------------------------------------------------------------------
[2024-11-24 13:37:14,409] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2024-11-24 13:37:14,409] DEEPMD INFO    source:                
[2024-11-24 13:37:14,409] DEEPMD INFO    source branch:         HEAD
[2024-11-24 13:37:14,409] DEEPMD INFO    source commit:         b1be266
[2024-11-24 13:37:14,409] DEEPMD INFO    source commit at:      2024-11-23 01:37:55 -0800
[2024-11-24 13:37:14,410] DEEPMD INFO    use float prec:        double
[2024-11-24 13:37:14,410] DEEPMD INFO    build variant:         cuda
[2024-11-24 13:37:14,410] DEEPMD INFO    Backend:               TensorFlow
[2024-11-24 13:37:14,410] DEEPMD INFO    TF ver:                unknown
[2024-11-24 13:37:14,410] DEEPMD INFO    build with TF ver:     2.17.0
[2024-11-24 13:37:14,410] DEEPMD INFO    build with TF inc:     /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include/
[2024-11-24 13:37:14,410] DEEPMD INFO                           /root/deepmd-kit/include
[2024-11-24 13:37:14,410] DEEPMD INFO    build with TF lib:     
[2024-11-24 13:37:14,410] DEEPMD INFO    running on:            bohrium-156-1225901
[2024-11-24 13:37:14,410] DEEPMD INFO    computing device:      gpu:0
[2024-11-24 13:37:14,410] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2024-11-24 13:37:14,410] DEEPMD INFO    Count of visible GPUs: 1
[2024-11-24 13:37:14,410] DEEPMD INFO    num_intra_threads:     0
[2024-11-24 13:37:14,410] DEEPMD INFO    num_inter_threads:     0
[2024-11-24 13:37:14,410] DEEPMD INFO    ----------------------------------------------------------------------------------------
2024-11-24 13:37:14.414808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:37:14.418459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:37:14,427] DEEPMD INFO    ---Summary of DataSystem: training     -----------------------------------------------
[2024-11-24 13:37:14,427] DEEPMD INFO    found 3 system(s):
[2024-11-24 13:37:14,427] DEEPMD INFO                                        system  natoms  bch_sz   n_bch       prob  pbc
[2024-11-24 13:37:14,427] DEEPMD INFO                               ../data/data_0/     192       1      80  2.500e-01    T
[2024-11-24 13:37:14,427] DEEPMD INFO                               ../data/data_1/     192       1     160  5.000e-01    T
[2024-11-24 13:37:14,427] DEEPMD INFO                               ../data/data_2/     192       1      80  2.500e-01    T
[2024-11-24 13:37:14,427] DEEPMD INFO    --------------------------------------------------------------------------------------
[2024-11-24 13:37:14,430] DEEPMD INFO    ---Summary of DataSystem: validation   -----------------------------------------------
[2024-11-24 13:37:14,430] DEEPMD INFO    found 1 system(s):
[2024-11-24 13:37:14,430] DEEPMD INFO                                        system  natoms  bch_sz   n_bch       prob  pbc
[2024-11-24 13:37:14,430] DEEPMD INFO                                ../data/data_3     192       1      80  1.000e+00    T
[2024-11-24 13:37:14,430] DEEPMD INFO    --------------------------------------------------------------------------------------
[2024-11-24 13:37:14,430] DEEPMD INFO    training without frame parameter
[2024-11-24 13:37:14,430] DEEPMD INFO    data stating... (this step may take long time)
[2024-11-24 13:37:14,517] DEEPMD INFO    built lr
[2024-11-24 13:37:14,606] DEEPMD INFO    use the compressible model with stripped type embedding
[2024-11-24 13:37:15,184] DEEPMD INFO    built network
[2024-11-24 13:37:16,166] DEEPMD INFO    built training
[2024-11-24 13:37:16,167] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-11-24 13:37:16.169002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:37:16,210] DEEPMD INFO    initialize model from scratch
[2024-11-24 13:37:16,905] DEEPMD INFO    start training at lr 1.00e-03 (== 1.00e-03), decay_step 11, decay_rate 0.893302, final lr will be 3.89e-08
[2024-11-24 13:37:19,189] DEEPMD INFO    batch       0: trn: rmse = 2.61e+01, rmse_e = 1.66e-01, rmse_f = 8.24e-01, lr = 1.00e-03
[2024-11-24 13:37:19,189] DEEPMD INFO    batch       0: val: rmse = 2.60e+01, rmse_e = 1.67e-01, rmse_f = 8.22e-01
2024-11-24 13:37:19.698894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:37:21,003] DEEPMD INFO    batch     100: trn: rmse = 1.51e+01, rmse_e = 1.23e-03, rmse_f = 7.94e-01, lr = 3.62e-04
[2024-11-24 13:37:21,003] DEEPMD INFO    batch     100: val: rmse = 1.55e+01, rmse_e = 2.31e-03, rmse_f = 8.13e-01
[2024-11-24 13:37:21,003] DEEPMD INFO    batch     100: total wall time = 4.10 s
[2024-11-24 13:37:22,227] DEEPMD INFO    batch     200: trn: rmse = 9.49e+00, rmse_e = 4.69e-03, rmse_f = 8.25e-01, lr = 1.31e-04
[2024-11-24 13:37:22,227] DEEPMD INFO    batch     200: val: rmse = 8.65e+00, rmse_e = 3.92e-03, rmse_f = 7.53e-01
[2024-11-24 13:37:22,227] DEEPMD INFO    batch     200: total wall time = 1.22 s
[2024-11-24 13:37:23,459] DEEPMD INFO    batch     300: trn: rmse = 5.62e+00, rmse_e = 3.26e-03, rmse_f = 8.07e-01, lr = 4.75e-05
[2024-11-24 13:37:23,459] DEEPMD INFO    batch     300: val: rmse = 5.55e+00, rmse_e = 7.08e-03, rmse_f = 7.96e-01
[2024-11-24 13:37:23,459] DEEPMD INFO    batch     300: total wall time = 1.23 s
[2024-11-24 13:37:24,683] DEEPMD INFO    batch     400: trn: rmse = 3.62e+00, rmse_e = 2.03e-03, rmse_f = 8.48e-01, lr = 1.72e-05
[2024-11-24 13:37:24,684] DEEPMD INFO    batch     400: val: rmse = 3.45e+00, rmse_e = 1.08e-03, rmse_f = 8.08e-01
[2024-11-24 13:37:24,684] DEEPMD INFO    batch     400: total wall time = 1.22 s
[2024-11-24 13:37:25,911] DEEPMD INFO    batch     500: trn: rmse = 1.99e+00, rmse_e = 1.98e-03, rmse_f = 7.40e-01, lr = 6.24e-06
[2024-11-24 13:37:25,911] DEEPMD INFO    batch     500: val: rmse = 1.98e+00, rmse_e = 3.81e-03, rmse_f = 7.36e-01
[2024-11-24 13:37:25,911] DEEPMD INFO    batch     500: total wall time = 1.23 s
[2024-11-24 13:37:27,140] DEEPMD INFO    batch     600: trn: rmse = 1.47e+00, rmse_e = 2.12e-03, rmse_f = 8.15e-01, lr = 2.26e-06
[2024-11-24 13:37:27,140] DEEPMD INFO    batch     600: val: rmse = 1.44e+00, rmse_e = 2.21e-03, rmse_f = 7.95e-01
[2024-11-24 13:37:27,140] DEEPMD INFO    batch     600: total wall time = 1.23 s
[2024-11-24 13:37:28,372] DEEPMD INFO    batch     700: trn: rmse = 9.39e-01, rmse_e = 3.48e-04, rmse_f = 6.96e-01, lr = 8.18e-07
[2024-11-24 13:37:28,372] DEEPMD INFO    batch     700: val: rmse = 1.03e+00, rmse_e = 2.61e-03, rmse_f = 7.61e-01
[2024-11-24 13:37:28,372] DEEPMD INFO    batch     700: total wall time = 1.23 s
[2024-11-24 13:37:29,602] DEEPMD INFO    batch     800: trn: rmse = 9.69e-01, rmse_e = 5.29e-03, rmse_f = 8.49e-01, lr = 2.96e-07
[2024-11-24 13:37:29,602] DEEPMD INFO    batch     800: val: rmse = 9.12e-01, rmse_e = 2.68e-03, rmse_f = 8.00e-01
[2024-11-24 13:37:29,602] DEEPMD INFO    batch     800: total wall time = 1.23 s
[2024-11-24 13:37:30,830] DEEPMD INFO    batch     900: trn: rmse = 8.57e-01, rmse_e = 7.61e-03, rmse_f = 8.09e-01, lr = 1.07e-07
[2024-11-24 13:37:30,830] DEEPMD INFO    batch     900: val: rmse = 8.64e-01, rmse_e = 3.19e-03, rmse_f = 8.19e-01
[2024-11-24 13:37:30,830] DEEPMD INFO    batch     900: total wall time = 1.23 s
[2024-11-24 13:37:32,057] DEEPMD INFO    batch    1000: trn: rmse = 7.97e-01, rmse_e = 3.67e-03, rmse_f = 7.80e-01, lr = 3.89e-08
[2024-11-24 13:37:32,057] DEEPMD INFO    batch    1000: val: rmse = 8.20e-01, rmse_e = 3.26e-03, rmse_f = 8.03e-01
[2024-11-24 13:37:32,057] DEEPMD INFO    batch    1000: total wall time = 1.23 s
[2024-11-24 13:37:32,209] DEEPMD INFO    saved checkpoint model.ckpt
[2024-11-24 13:37:32,209] DEEPMD INFO    average training time: 0.0119 s/batch (exclude first 100 batches)
[2024-11-24 13:37:32,209] DEEPMD INFO    finished training
[2024-11-24 13:37:32,209] DEEPMD INFO    wall time: 16.042 s

代码

文本

[12]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp --tf freeze

2024-11-24 13:38:05.540648: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 13:38:05.558121: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 13:38:05.563439: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-24 13:38:05.575998: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[bohrium-156-1225901:00626] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/1823997952/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:00626] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
2024-11-24 13:38:11.055650: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-11-24 13:38:11.055917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:38:11.094493: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
[2024-11-24 13:38:11,246] DEEPMD INFO    The following nodes will be frozen: ['fitting_attr/daparam', 'o_force', 'model_attr/model_version', 'model_attr/tmap', 'fitting_attr/dfparam', 'model_attr/model_type', 'descrpt_attr/rcut', 't_mesh', 'train_attr/min_nbor_dist', 'model_type', 'train_attr/training_script', 'o_energy', 'o_atom_energy', 'descrpt_attr/ntypes', 'o_virial', 'o_atom_virial']
[2024-11-24 13:38:11,517] DEEPMD INFO    782 ops in the final graph.

代码

文本

[16]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp --tf compress

2024-11-24 13:41:09.288551: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 13:41:09.305736: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 13:41:09.310908: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-24 13:41:09.323223: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-11-24 13:41:13.308673: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-11-24 13:41:13.308915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:13.334481: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-11-24 13:41:13.357394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:13.385652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:41:13,459] DEEPMD INFO    


[2024-11-24 13:41:13,459] DEEPMD INFO    stage 1: compress the model
[bohrium-156-1225901:00959] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/1874722816/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:00959] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
[2024-11-24 13:41:19,458] DEEPMD INFO     _____               _____   __  __  _____           _     _  _   
[2024-11-24 13:41:19,458] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |  
[2024-11-24 13:41:19,458] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_ 
[2024-11-24 13:41:19,458] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2024-11-24 13:41:19,458] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_ 
[2024-11-24 13:41:19,458] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2024-11-24 13:41:19,458] DEEPMD INFO    Please read and cite:
[2024-11-24 13:41:19,458] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2024-11-24 13:41:19,458] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2024-11-24 13:41:19,458] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2024-11-24 13:41:19,458] DEEPMD INFO    ----------------------------------------------------------------------------------------
[2024-11-24 13:41:19,458] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2024-11-24 13:41:19,458] DEEPMD INFO    source:                
[2024-11-24 13:41:19,458] DEEPMD INFO    source branch:         HEAD
[2024-11-24 13:41:19,458] DEEPMD INFO    source commit:         b1be266
[2024-11-24 13:41:19,458] DEEPMD INFO    source commit at:      2024-11-23 01:37:55 -0800
[2024-11-24 13:41:19,458] DEEPMD INFO    use float prec:        double
[2024-11-24 13:41:19,458] DEEPMD INFO    build variant:         cuda
[2024-11-24 13:41:19,458] DEEPMD INFO    Backend:               TensorFlow
[2024-11-24 13:41:19,458] DEEPMD INFO    TF ver:                unknown
[2024-11-24 13:41:19,458] DEEPMD INFO    build with TF ver:     2.17.0
[2024-11-24 13:41:19,458] DEEPMD INFO    build with TF inc:     /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include/
[2024-11-24 13:41:19,459] DEEPMD INFO                           /root/deepmd-kit/include
[2024-11-24 13:41:19,459] DEEPMD INFO    build with TF lib:     
[2024-11-24 13:41:19,459] DEEPMD INFO    running on:            bohrium-156-1225901
[2024-11-24 13:41:19,459] DEEPMD INFO    computing device:      gpu:0
[2024-11-24 13:41:19,459] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2024-11-24 13:41:19,459] DEEPMD INFO    Count of visible GPUs: 1
[2024-11-24 13:41:19,459] DEEPMD INFO    num_intra_threads:     0
[2024-11-24 13:41:19,459] DEEPMD INFO    num_inter_threads:     0
[2024-11-24 13:41:19,459] DEEPMD INFO    ----------------------------------------------------------------------------------------
2024-11-24 13:41:19.465275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:19.468901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:41:19,469] DEEPMD INFO    training without frame parameter
2024-11-24 13:41:19.537218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:19.538325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:19.561792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:41:19,587] DEEPMD INFO    training data with lower boundary: [-0.3899236  -0.42140616]
[2024-11-24 13:41:19,587] DEEPMD INFO    training data with upper boundary: [7.18840882 8.14703945]
2024-11-24 13:41:19.901264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:19.938585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:19.961196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:19.984552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 13:41:20.014652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:41:20,047] DEEPMD INFO    built lr
[2024-11-24 13:41:20,292] DEEPMD INFO    use the compressible model with stripped type embedding
[2024-11-24 13:41:20,657] DEEPMD INFO    built network
[2024-11-24 13:41:21,115] DEEPMD INFO    built training
[2024-11-24 13:41:21,115] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-11-24 13:41:21.117035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:41:21,141] DEEPMD INFO    initialize model from scratch
[2024-11-24 13:41:21,535] DEEPMD INFO    finished compressing
[2024-11-24 13:41:21,541] DEEPMD INFO    


[2024-11-24 13:41:21,541] DEEPMD INFO    stage 2: freeze the model
2024-11-24 13:41:21.762211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
[2024-11-24 13:41:21,891] DEEPMD INFO    The following nodes will be frozen: ['t_mesh', 'o_atom_energy', 'model_attr/tmap', 'o_energy', 'descrpt_attr/ntypes', 'descrpt_attr/rcut', 'o_atom_virial', 'model_attr/model_version', 'fitting_attr/daparam', 'train_attr/min_nbor_dist', 'o_force', 'model_type', 'o_virial', 'fitting_attr/dfparam', 'train_attr/training_script', 'model_attr/model_type']
[2024-11-24 13:41:21,999] DEEPMD INFO    633 ops in the final graph.

代码

文本

PyTorch

代码

文本

[17]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp --pt train input.json

To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-11-24 13:41:32,604] DEEPMD INFO    DeePMD version: 3.0.0
[2024-11-24 13:41:32,604] DEEPMD INFO    Configuration path: input.json
[bohrium-156-1225901:01051] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/2329346048/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:01051] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
[2024-11-24 13:41:33,817] DEEPMD INFO     _____               _____   __  __  _____           _     _  _   
[2024-11-24 13:41:33,817] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |  
[2024-11-24 13:41:33,817] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_ 
[2024-11-24 13:41:33,817] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2024-11-24 13:41:33,817] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_ 
[2024-11-24 13:41:33,817] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2024-11-24 13:41:33,817] DEEPMD INFO    Please read and cite:
[2024-11-24 13:41:33,817] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2024-11-24 13:41:33,817] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2024-11-24 13:41:33,817] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2024-11-24 13:41:33,817] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2024-11-24 13:41:33,817] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2024-11-24 13:41:33,818] DEEPMD INFO    source:                
[2024-11-24 13:41:33,818] DEEPMD INFO    source branch:         HEAD
[2024-11-24 13:41:33,818] DEEPMD INFO    source commit:         b1be266
[2024-11-24 13:41:33,818] DEEPMD INFO    source commit at:      2024-11-23 01:37:55 -0800
[2024-11-24 13:41:33,818] DEEPMD INFO    use float prec:        double
[2024-11-24 13:41:33,818] DEEPMD INFO    build variant:         cuda
[2024-11-24 13:41:33,818] DEEPMD INFO    Backend:               PyTorch
[2024-11-24 13:41:33,818] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2024-11-24 13:41:33,818] DEEPMD INFO    Enable custom OP:      True
[2024-11-24 13:41:33,818] DEEPMD INFO    build with PT ver:     2.4.1
[2024-11-24 13:41:33,818] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2024-11-24 13:41:33,818] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2024-11-24 13:41:33,818] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2024-11-24 13:41:33,818] DEEPMD INFO    running on:            bohrium-156-1225901
[2024-11-24 13:41:33,818] DEEPMD INFO    computing device:      cuda:0
[2024-11-24 13:41:33,818] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2024-11-24 13:41:33,818] DEEPMD INFO    Count of visible GPUs: 1
[2024-11-24 13:41:33,818] DEEPMD INFO    num_intra_threads:     0
[2024-11-24 13:41:33,818] DEEPMD INFO    num_inter_threads:     0
[2024-11-24 13:41:33,818] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2024-11-24 13:41:33,975] DEEPMD INFO    Calculate neighbor statistics... (add --skip-neighbor-stat to skip this step)
[2024-11-24 13:41:37,538] DEEPMD INFO    Adjust batch size from 1024 to 2048
[2024-11-24 13:41:37,657] DEEPMD INFO    Adjust batch size from 2048 to 4096
[2024-11-24 13:41:37,747] DEEPMD INFO    Adjust batch size from 4096 to 8192
[2024-11-24 13:41:38,111] DEEPMD INFO    Adjust batch size from 8192 to 16384
[2024-11-24 13:41:38,179] DEEPMD INFO    training data with min nbor dist: 0.8854385688525499
[2024-11-24 13:41:38,180] DEEPMD INFO    training data with max nbor size: [108]
[2024-11-24 13:41:38,238] DEEPMD INFO    Packing data for statistics from 3 systems
[2024-11-24 13:41:38,341] DEEPMD INFO    RMSE of energy per atom after linear regression is: 0.003581501976900343 in the unit of energy.
[2024-11-24 13:41:38,860] DEEPMD INFO    ---Summary of DataSystem: training     -----------------------------------------------
[2024-11-24 13:41:38,860] DEEPMD INFO    found 3 system(s):
[2024-11-24 13:41:38,860] DEEPMD INFO                                        system  natoms  bch_sz   n_bch       prob  pbc
[2024-11-24 13:41:38,860] DEEPMD INFO                               ../data/data_0/     192       1      80  2.500e-01    T
[2024-11-24 13:41:38,860] DEEPMD INFO                               ../data/data_1/     192       1     160  5.000e-01    T
[2024-11-24 13:41:38,860] DEEPMD INFO                               ../data/data_2/     192       1      80  2.500e-01    T
[2024-11-24 13:41:38,861] DEEPMD INFO    --------------------------------------------------------------------------------------
[2024-11-24 13:41:38,863] DEEPMD INFO    ---Summary of DataSystem: validation   -----------------------------------------------
[2024-11-24 13:41:38,863] DEEPMD INFO    found 1 system(s):
[2024-11-24 13:41:38,864] DEEPMD INFO                                        system  natoms  bch_sz   n_bch       prob  pbc
[2024-11-24 13:41:38,864] DEEPMD INFO                                ../data/data_3     192       1      80  1.000e+00    T
[2024-11-24 13:41:38,864] DEEPMD INFO    --------------------------------------------------------------------------------------
[2024-11-24 13:41:38,867] DEEPMD INFO    Start to train 1000 steps.
[2024-11-24 13:41:40,496] DEEPMD INFO    batch       1: trn: rmse = 2.54e+01, rmse_e = 1.80e+00, rmse_f = 7.94e-01, lr = 1.00e-03
[2024-11-24 13:41:40,496] DEEPMD INFO    batch       1: val: rmse = 3.66e+01, rmse_e = 1.94e+00, rmse_f = 1.15e+00
[2024-11-24 13:41:40,496] DEEPMD INFO    batch       1: total wall time = 1.63 s
[2024-11-24 13:41:42,309] DEEPMD INFO    batch     100: trn: rmse = 1.43e+01, rmse_e = 3.69e-03, rmse_f = 7.52e-01, lr = 3.62e-04
[2024-11-24 13:41:42,309] DEEPMD INFO    batch     100: val: rmse = 1.46e+01, rmse_e = 3.72e-03, rmse_f = 7.65e-01
[2024-11-24 13:41:42,309] DEEPMD INFO    batch     100: total wall time = 1.81 s
[2024-11-24 13:41:44,146] DEEPMD INFO    batch     200: trn: rmse = 7.71e+00, rmse_e = 1.72e-02, rmse_f = 6.70e-01, lr = 1.31e-04
[2024-11-24 13:41:44,146] DEEPMD INFO    batch     200: val: rmse = 8.22e+00, rmse_e = 9.88e-03, rmse_f = 7.15e-01
[2024-11-24 13:41:44,146] DEEPMD INFO    batch     200: total wall time = 1.84 s
[2024-11-24 13:41:45,985] DEEPMD INFO    batch     300: trn: rmse = 4.86e+00, rmse_e = 1.64e-03, rmse_f = 6.98e-01, lr = 4.75e-05
[2024-11-24 13:41:45,986] DEEPMD INFO    batch     300: val: rmse = 4.81e+00, rmse_e = 3.24e-03, rmse_f = 6.91e-01
[2024-11-24 13:41:45,986] DEEPMD INFO    batch     300: total wall time = 1.84 s
[2024-11-24 13:41:47,826] DEEPMD INFO    batch     400: trn: rmse = 3.04e+00, rmse_e = 1.54e-05, rmse_f = 7.13e-01, lr = 1.72e-05
[2024-11-24 13:41:47,826] DEEPMD INFO    batch     400: val: rmse = 3.07e+00, rmse_e = 1.01e-03, rmse_f = 7.20e-01
[2024-11-24 13:41:47,827] DEEPMD INFO    batch     400: total wall time = 1.84 s
[2024-11-24 13:41:49,651] DEEPMD INFO    batch     500: trn: rmse = 1.83e+00, rmse_e = 4.46e-03, rmse_f = 6.80e-01, lr = 6.24e-06
[2024-11-24 13:41:49,652] DEEPMD INFO    batch     500: val: rmse = 1.85e+00, rmse_e = 1.77e-03, rmse_f = 6.88e-01
[2024-11-24 13:41:49,652] DEEPMD INFO    batch     500: total wall time = 1.83 s
[2024-11-24 13:41:51,479] DEEPMD INFO    batch     600: trn: rmse = 1.27e+00, rmse_e = 9.51e-04, rmse_f = 7.03e-01, lr = 2.26e-06
[2024-11-24 13:41:51,479] DEEPMD INFO    batch     600: val: rmse = 1.20e+00, rmse_e = 2.42e-03, rmse_f = 6.64e-01
[2024-11-24 13:41:51,479] DEEPMD INFO    batch     600: total wall time = 1.83 s
[2024-11-24 13:41:53,297] DEEPMD INFO    batch     700: trn: rmse = 8.49e-01, rmse_e = 3.27e-04, rmse_f = 6.30e-01, lr = 8.18e-07
[2024-11-24 13:41:53,297] DEEPMD INFO    batch     700: val: rmse = 9.10e-01, rmse_e = 3.94e-03, rmse_f = 6.74e-01
[2024-11-24 13:41:53,297] DEEPMD INFO    batch     700: total wall time = 1.82 s
[2024-11-24 13:41:55,116] DEEPMD INFO    batch     800: trn: rmse = 7.90e-01, rmse_e = 1.85e-04, rmse_f = 6.94e-01, lr = 2.96e-07
[2024-11-24 13:41:55,116] DEEPMD INFO    batch     800: val: rmse = 7.95e-01, rmse_e = 7.31e-03, rmse_f = 6.91e-01
[2024-11-24 13:41:55,116] DEEPMD INFO    batch     800: total wall time = 1.82 s
[2024-11-24 13:41:56,935] DEEPMD INFO    batch     900: trn: rmse = 7.36e-01, rmse_e = 3.92e-03, rmse_f = 6.98e-01, lr = 1.07e-07
[2024-11-24 13:41:56,935] DEEPMD INFO    batch     900: val: rmse = 7.23e-01, rmse_e = 2.54e-03, rmse_f = 6.86e-01
[2024-11-24 13:41:56,935] DEEPMD INFO    batch     900: total wall time = 1.82 s
[2024-11-24 13:41:58,762] DEEPMD INFO    batch    1000: trn: rmse = 7.16e-01, rmse_e = 1.34e-03, rmse_f = 7.02e-01, lr = 3.89e-08
[2024-11-24 13:41:58,762] DEEPMD INFO    batch    1000: val: rmse = 7.11e-01, rmse_e = 7.25e-03, rmse_f = 6.90e-01
[2024-11-24 13:41:58,762] DEEPMD INFO    batch    1000: total wall time = 1.83 s
[2024-11-24 13:41:58,799] DEEPMD INFO    Saved model to model.ckpt-1000.pt
[2024-11-24 13:41:58,800] DEEPMD INFO    average training time: 0.0165 s/batch
[2024-11-24 13:41:58,800] DEEPMD INFO    Trained model has been saved to: model.ckpt

代码

文本

[18]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp --pt freeze

To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-11-24 13:42:29,848] DEEPMD INFO    DeePMD version: 3.0.0
[2024-11-24 13:42:32,084] DEEPMD INFO    Saved frozen model to frozen_model.pth

代码

文本

[19]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp --pt compress

To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-11-24 13:42:36,619] DEEPMD INFO    DeePMD version: 3.0.0
[2024-11-24 13:42:38,442] DEEPMD INFO    training data with lower boundary: [[-0.38978084 -0.         -0.         -0.        ]
 [-0.42122849 -0.         -0.         -0.        ]]
[2024-11-24 13:42:38,442] DEEPMD INFO    training data with upper boundary: [[ 7.19234743 12.23617724 12.23617724 12.23617724]
 [ 8.15177409 13.68445638 13.68445638 13.68445638]]

代码

文本

我们现在得到了4个模型文件：frozen_model.pb、frozen_model_compressed.pb、frozen_model.pth、frozen_model_compressed.pth。

代码

文本

模型转换

代码

文本

JAX后端目前不支持训练，因此我们用dp convert-backend 将PyTorch后端模型文件转换为JAX后端模型文件：

代码

文本

[20]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp convert-backend frozen_model.pth frozen_model.savedmodel

[2024-11-24 13:43:31,664] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-11-24 13:43:35.224619: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 13:43:35.244121: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 13:43:35.250210: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

代码

文本

模型测试

代码

文本

dp test将根据模型的后缀名判断后端，无需使用--tf或--pt。

再次提醒：模型只训练了1000步，因此RMSE大是正常现象。

代码

文本

[22]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp test -m frozen_model_compressed.pb -s ../data

2024-11-24 13:45:14.152459: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 13:45:14.169647: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 13:45:14.174884: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2024-11-24 13:45:16,573] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-11-24 13:45:18,327] DEEPMD INFO    If you encounter the error 'an illegal memory access was encountered', this may be due to a TensorFlow issue. To avoid this, set the environment variable DP_INFER_BATCH_SIZE to a smaller value than the last adjusted batch size. The environment variable DP_INFER_BATCH_SIZE controls the inference batch size (nframes * natoms). 
[2024-11-24 13:45:18,357] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:45:18,357] DEEPMD INFO    # testing system : ../data/data_1
[2024-11-24 13:45:21,221] DEEPMD INFO    Adjust batch size from 1024 to 2048
[2024-11-24 13:45:21,244] DEEPMD INFO    Adjust batch size from 2048 to 4096
[2024-11-24 13:45:21,287] DEEPMD INFO    Adjust batch size from 4096 to 8192
[2024-11-24 13:45:21,369] DEEPMD INFO    Adjust batch size from 8192 to 16384
[2024-11-24 13:45:21,546] DEEPMD INFO    # number of test data : 160 
[2024-11-24 13:45:21,546] DEEPMD INFO    Energy MAE         : 5.179024e-01 eV
[2024-11-24 13:45:21,546] DEEPMD INFO    Energy RMSE        : 6.224318e-01 eV
[2024-11-24 13:45:21,546] DEEPMD INFO    Energy MAE/Natoms  : 2.697408e-03 eV
[2024-11-24 13:45:21,546] DEEPMD INFO    Energy RMSE/Natoms : 3.241832e-03 eV
[2024-11-24 13:45:21,546] DEEPMD INFO    Force  MAE         : 5.844789e-01 eV/A
[2024-11-24 13:45:21,546] DEEPMD INFO    Force  RMSE        : 7.829774e-01 eV/A
[2024-11-24 13:45:21,547] DEEPMD INFO    Virial MAE         : 1.972750e+01 eV
[2024-11-24 13:45:21,547] DEEPMD INFO    Virial RMSE        : 3.339188e+01 eV
[2024-11-24 13:45:21,547] DEEPMD INFO    Virial MAE/Natoms  : 1.027474e-01 eV
[2024-11-24 13:45:21,547] DEEPMD INFO    Virial RMSE/Natoms : 1.739160e-01 eV
[2024-11-24 13:45:21,547] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:45:21,547] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:45:21,547] DEEPMD INFO    # testing system : ../data/data_2
[2024-11-24 13:45:21,627] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:45:21,628] DEEPMD INFO    Energy MAE         : 4.779663e-01 eV
[2024-11-24 13:45:21,628] DEEPMD INFO    Energy RMSE        : 6.033248e-01 eV
[2024-11-24 13:45:21,628] DEEPMD INFO    Energy MAE/Natoms  : 2.489408e-03 eV
[2024-11-24 13:45:21,628] DEEPMD INFO    Energy RMSE/Natoms : 3.142316e-03 eV
[2024-11-24 13:45:21,628] DEEPMD INFO    Force  MAE         : 5.813479e-01 eV/A
[2024-11-24 13:45:21,628] DEEPMD INFO    Force  RMSE        : 7.761452e-01 eV/A
[2024-11-24 13:45:21,628] DEEPMD INFO    Virial MAE         : 1.961763e+01 eV
[2024-11-24 13:45:21,628] DEEPMD INFO    Virial RMSE        : 3.325970e+01 eV
[2024-11-24 13:45:21,628] DEEPMD INFO    Virial MAE/Natoms  : 1.021752e-01 eV
[2024-11-24 13:45:21,628] DEEPMD INFO    Virial RMSE/Natoms : 1.732276e-01 eV
[2024-11-24 13:45:21,628] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:45:21,628] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:45:21,628] DEEPMD INFO    # testing system : ../data/data_0
[2024-11-24 13:45:21,710] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:45:21,710] DEEPMD INFO    Energy MAE         : 5.671510e-01 eV
[2024-11-24 13:45:21,710] DEEPMD INFO    Energy RMSE        : 7.216989e-01 eV
[2024-11-24 13:45:21,710] DEEPMD INFO    Energy MAE/Natoms  : 2.953911e-03 eV
[2024-11-24 13:45:21,710] DEEPMD INFO    Energy RMSE/Natoms : 3.758848e-03 eV
[2024-11-24 13:45:21,710] DEEPMD INFO    Force  MAE         : 5.845286e-01 eV/A
[2024-11-24 13:45:21,710] DEEPMD INFO    Force  RMSE        : 7.833679e-01 eV/A
[2024-11-24 13:45:21,710] DEEPMD INFO    Virial MAE         : 1.964959e+01 eV
[2024-11-24 13:45:21,710] DEEPMD INFO    Virial RMSE        : 3.334908e+01 eV
[2024-11-24 13:45:21,710] DEEPMD INFO    Virial MAE/Natoms  : 1.023416e-01 eV
[2024-11-24 13:45:21,710] DEEPMD INFO    Virial RMSE/Natoms : 1.736931e-01 eV
[2024-11-24 13:45:21,710] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:45:21,710] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:45:21,710] DEEPMD INFO    # testing system : ../data/data_3
[2024-11-24 13:45:21,789] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:45:21,789] DEEPMD INFO    Energy MAE         : 5.817025e-01 eV
[2024-11-24 13:45:21,789] DEEPMD INFO    Energy RMSE        : 7.101678e-01 eV
[2024-11-24 13:45:21,789] DEEPMD INFO    Energy MAE/Natoms  : 3.029701e-03 eV
[2024-11-24 13:45:21,789] DEEPMD INFO    Energy RMSE/Natoms : 3.698790e-03 eV
[2024-11-24 13:45:21,789] DEEPMD INFO    Force  MAE         : 5.875087e-01 eV/A
[2024-11-24 13:45:21,789] DEEPMD INFO    Force  RMSE        : 7.852253e-01 eV/A
[2024-11-24 13:45:21,789] DEEPMD INFO    Virial MAE         : 1.962188e+01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Virial RMSE        : 3.324730e+01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Virial MAE/Natoms  : 1.021973e-01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Virial RMSE/Natoms : 1.731630e-01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:45:21,790] DEEPMD INFO    # ----------weighted average of errors----------- 
[2024-11-24 13:45:21,790] DEEPMD INFO    # number of systems : 4
[2024-11-24 13:45:21,790] DEEPMD INFO    Energy MAE         : 5.325249e-01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Energy RMSE        : 6.578801e-01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Energy MAE/Natoms  : 2.773567e-03 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Energy RMSE/Natoms : 3.426459e-03 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Force  MAE         : 5.844686e-01 eV/A
[2024-11-24 13:45:21,790] DEEPMD INFO    Force  RMSE        : 7.821448e-01 eV/A
[2024-11-24 13:45:21,790] DEEPMD INFO    Virial MAE         : 1.966882e+01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Virial RMSE        : 3.332803e+01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Virial MAE/Natoms  : 1.024418e-01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    Virial RMSE/Natoms : 1.735835e-01 eV
[2024-11-24 13:45:21,790] DEEPMD INFO    # -----------------------------------------------

代码

文本

[24]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp test -m frozen_model_compressed.pth -s ../data

[2024-11-24 13:46:02,650] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-11-24 13:46:05,378] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:46:05,378] DEEPMD INFO    # testing system : ../data/data_1
[2024-11-24 13:46:07,995] DEEPMD INFO    Adjust batch size from 1024 to 2048
[2024-11-24 13:46:08,686] DEEPMD INFO    Adjust batch size from 2048 to 4096
[2024-11-24 13:46:09,696] DEEPMD INFO    Adjust batch size from 4096 to 8192
[2024-11-24 13:46:10,442] DEEPMD INFO    Adjust batch size from 8192 to 16384
[2024-11-24 13:46:11,741] DEEPMD INFO    # number of test data : 160 
[2024-11-24 13:46:11,741] DEEPMD INFO    Energy MAE         : 6.760290e-01 eV
[2024-11-24 13:46:11,741] DEEPMD INFO    Energy RMSE        : 8.614848e-01 eV
[2024-11-24 13:46:11,741] DEEPMD INFO    Energy MAE/Natoms  : 3.520984e-03 eV
[2024-11-24 13:46:11,741] DEEPMD INFO    Energy RMSE/Natoms : 4.486900e-03 eV
[2024-11-24 13:46:11,741] DEEPMD INFO    Force  MAE         : 5.224798e-01 eV/A
[2024-11-24 13:46:11,741] DEEPMD INFO    Force  RMSE        : 6.874469e-01 eV/A
[2024-11-24 13:46:11,741] DEEPMD INFO    Virial MAE         : 1.055268e+02 eV
[2024-11-24 13:46:11,741] DEEPMD INFO    Virial RMSE        : 1.807128e+02 eV
[2024-11-24 13:46:11,741] DEEPMD INFO    Virial MAE/Natoms  : 5.496190e-01 eV
[2024-11-24 13:46:11,741] DEEPMD INFO    Virial RMSE/Natoms : 9.412124e-01 eV
[2024-11-24 13:46:11,741] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:46:11,741] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:46:11,741] DEEPMD INFO    # testing system : ../data/data_2
[2024-11-24 13:46:11,867] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:46:11,868] DEEPMD INFO    Energy MAE         : 5.619077e-01 eV
[2024-11-24 13:46:11,868] DEEPMD INFO    Energy RMSE        : 7.171949e-01 eV
[2024-11-24 13:46:11,868] DEEPMD INFO    Energy MAE/Natoms  : 2.926602e-03 eV
[2024-11-24 13:46:11,868] DEEPMD INFO    Energy RMSE/Natoms : 3.735390e-03 eV
[2024-11-24 13:46:11,868] DEEPMD INFO    Force  MAE         : 5.206556e-01 eV/A
[2024-11-24 13:46:11,868] DEEPMD INFO    Force  RMSE        : 6.828324e-01 eV/A
[2024-11-24 13:46:11,868] DEEPMD INFO    Virial MAE         : 1.053576e+02 eV
[2024-11-24 13:46:11,868] DEEPMD INFO    Virial RMSE        : 1.804098e+02 eV
[2024-11-24 13:46:11,868] DEEPMD INFO    Virial MAE/Natoms  : 5.487373e-01 eV
[2024-11-24 13:46:11,868] DEEPMD INFO    Virial RMSE/Natoms : 9.396344e-01 eV
[2024-11-24 13:46:11,868] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:46:11,868] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:46:11,868] DEEPMD INFO    # testing system : ../data/data_0
[2024-11-24 13:46:11,991] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:46:11,992] DEEPMD INFO    Energy MAE         : 7.195613e-01 eV
[2024-11-24 13:46:11,992] DEEPMD INFO    Energy RMSE        : 8.736386e-01 eV
[2024-11-24 13:46:11,992] DEEPMD INFO    Energy MAE/Natoms  : 3.747715e-03 eV
[2024-11-24 13:46:11,992] DEEPMD INFO    Energy RMSE/Natoms : 4.550201e-03 eV
[2024-11-24 13:46:11,992] DEEPMD INFO    Force  MAE         : 5.222212e-01 eV/A
[2024-11-24 13:46:11,992] DEEPMD INFO    Force  RMSE        : 6.863119e-01 eV/A
[2024-11-24 13:46:11,992] DEEPMD INFO    Virial MAE         : 1.055419e+02 eV
[2024-11-24 13:46:11,992] DEEPMD INFO    Virial RMSE        : 1.807917e+02 eV
[2024-11-24 13:46:11,992] DEEPMD INFO    Virial MAE/Natoms  : 5.496976e-01 eV
[2024-11-24 13:46:11,992] DEEPMD INFO    Virial RMSE/Natoms : 9.416233e-01 eV
[2024-11-24 13:46:11,992] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:46:11,992] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:46:11,992] DEEPMD INFO    # testing system : ../data/data_3
[2024-11-24 13:46:12,114] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:46:12,114] DEEPMD INFO    Energy MAE         : 6.737088e-01 eV
[2024-11-24 13:46:12,114] DEEPMD INFO    Energy RMSE        : 8.147767e-01 eV
[2024-11-24 13:46:12,114] DEEPMD INFO    Energy MAE/Natoms  : 3.508900e-03 eV
[2024-11-24 13:46:12,114] DEEPMD INFO    Energy RMSE/Natoms : 4.243629e-03 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Force  MAE         : 5.249346e-01 eV/A
[2024-11-24 13:46:12,115] DEEPMD INFO    Force  RMSE        : 6.895287e-01 eV/A
[2024-11-24 13:46:12,115] DEEPMD INFO    Virial MAE         : 1.051727e+02 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Virial RMSE        : 1.802760e+02 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Virial MAE/Natoms  : 5.477746e-01 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Virial RMSE/Natoms : 9.389376e-01 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:46:12,115] DEEPMD INFO    # ----------weighted average of errors----------- 
[2024-11-24 13:46:12,115] DEEPMD INFO    # number of systems : 4
[2024-11-24 13:46:12,115] DEEPMD INFO    Energy MAE         : 6.614471e-01 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Energy RMSE        : 8.277423e-01 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Energy MAE/Natoms  : 3.445037e-03 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Energy RMSE/Natoms : 4.311158e-03 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Force  MAE         : 5.225542e-01 eV/A
[2024-11-24 13:46:12,115] DEEPMD INFO    Force  RMSE        : 6.867169e-01 eV/A
[2024-11-24 13:46:12,115] DEEPMD INFO    Virial MAE         : 1.054252e+02 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Virial RMSE        : 1.805807e+02 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Virial MAE/Natoms  : 5.490895e-01 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    Virial RMSE/Natoms : 9.405246e-01 eV
[2024-11-24 13:46:12,115] DEEPMD INFO    # -----------------------------------------------

代码

文本

[26]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

dp test -m frozen_model.savedmodel -s ../data

2024-11-24 13:46:45.393625: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 13:46:45.411187: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 13:46:45.416461: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING:absl:Importing a function (__inference_internal_grad_fn_2447) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_internal_grad_fn_2372) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_internal_grad_fn_2522) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_internal_grad_fn_2597) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_internal_grad_fn_2447) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_internal_grad_fn_2372) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_internal_grad_fn_2522) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_internal_grad_fn_2597) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
[2024-11-24 13:46:51,175] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:46:51,175] DEEPMD INFO    # testing system : ../data/data_1
[2024-11-24 13:46:59,689] DEEPMD INFO    Adjust batch size from 1024 to 2048
[2024-11-24 13:47:03,304] DEEPMD INFO    Adjust batch size from 2048 to 4096
[2024-11-24 13:47:07,243] DEEPMD INFO    Adjust batch size from 4096 to 8192
[2024-11-24 13:47:11,693] DEEPMD INFO    Adjust batch size from 8192 to 16384
[2024-11-24 13:47:16,897] DEEPMD INFO    # number of test data : 160 
[2024-11-24 13:47:16,898] DEEPMD INFO    Energy MAE         : 6.760290e-01 eV
[2024-11-24 13:47:16,898] DEEPMD INFO    Energy RMSE        : 8.614848e-01 eV
[2024-11-24 13:47:16,898] DEEPMD INFO    Energy MAE/Natoms  : 3.520984e-03 eV
[2024-11-24 13:47:16,898] DEEPMD INFO    Energy RMSE/Natoms : 4.486900e-03 eV
[2024-11-24 13:47:16,898] DEEPMD INFO    Force  MAE         : 5.224798e-01 eV/A
[2024-11-24 13:47:16,898] DEEPMD INFO    Force  RMSE        : 6.874469e-01 eV/A
[2024-11-24 13:47:16,898] DEEPMD INFO    Virial MAE         : 1.055268e+02 eV
[2024-11-24 13:47:16,898] DEEPMD INFO    Virial RMSE        : 1.807128e+02 eV
[2024-11-24 13:47:16,898] DEEPMD INFO    Virial MAE/Natoms  : 5.496190e-01 eV
[2024-11-24 13:47:16,898] DEEPMD INFO    Virial RMSE/Natoms : 9.412124e-01 eV
[2024-11-24 13:47:16,898] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:47:16,898] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:47:16,898] DEEPMD INFO    # testing system : ../data/data_2
[2024-11-24 13:47:22,023] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:47:22,023] DEEPMD INFO    Energy MAE         : 5.619077e-01 eV
[2024-11-24 13:47:22,023] DEEPMD INFO    Energy RMSE        : 7.171949e-01 eV
[2024-11-24 13:47:22,023] DEEPMD INFO    Energy MAE/Natoms  : 2.926602e-03 eV
[2024-11-24 13:47:22,023] DEEPMD INFO    Energy RMSE/Natoms : 3.735390e-03 eV
[2024-11-24 13:47:22,023] DEEPMD INFO    Force  MAE         : 5.206556e-01 eV/A
[2024-11-24 13:47:22,023] DEEPMD INFO    Force  RMSE        : 6.828324e-01 eV/A
[2024-11-24 13:47:22,023] DEEPMD INFO    Virial MAE         : 1.053576e+02 eV
[2024-11-24 13:47:22,023] DEEPMD INFO    Virial RMSE        : 1.804098e+02 eV
[2024-11-24 13:47:22,023] DEEPMD INFO    Virial MAE/Natoms  : 5.487373e-01 eV
[2024-11-24 13:47:22,023] DEEPMD INFO    Virial RMSE/Natoms : 9.396344e-01 eV
[2024-11-24 13:47:22,023] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:47:22,023] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:47:22,023] DEEPMD INFO    # testing system : ../data/data_0
[2024-11-24 13:47:22,497] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:47:22,497] DEEPMD INFO    Energy MAE         : 7.195613e-01 eV
[2024-11-24 13:47:22,497] DEEPMD INFO    Energy RMSE        : 8.736386e-01 eV
[2024-11-24 13:47:22,497] DEEPMD INFO    Energy MAE/Natoms  : 3.747715e-03 eV
[2024-11-24 13:47:22,497] DEEPMD INFO    Energy RMSE/Natoms : 4.550201e-03 eV
[2024-11-24 13:47:22,497] DEEPMD INFO    Force  MAE         : 5.222212e-01 eV/A
[2024-11-24 13:47:22,497] DEEPMD INFO    Force  RMSE        : 6.863119e-01 eV/A
[2024-11-24 13:47:22,497] DEEPMD INFO    Virial MAE         : 1.055419e+02 eV
[2024-11-24 13:47:22,497] DEEPMD INFO    Virial RMSE        : 1.807917e+02 eV
[2024-11-24 13:47:22,497] DEEPMD INFO    Virial MAE/Natoms  : 5.496976e-01 eV
[2024-11-24 13:47:22,497] DEEPMD INFO    Virial RMSE/Natoms : 9.416233e-01 eV
[2024-11-24 13:47:22,497] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:47:22,497] DEEPMD INFO    # ---------------output of dp test--------------- 
[2024-11-24 13:47:22,497] DEEPMD INFO    # testing system : ../data/data_3
[2024-11-24 13:47:22,969] DEEPMD INFO    # number of test data : 80 
[2024-11-24 13:47:22,969] DEEPMD INFO    Energy MAE         : 6.737088e-01 eV
[2024-11-24 13:47:22,969] DEEPMD INFO    Energy RMSE        : 8.147767e-01 eV
[2024-11-24 13:47:22,969] DEEPMD INFO    Energy MAE/Natoms  : 3.508900e-03 eV
[2024-11-24 13:47:22,969] DEEPMD INFO    Energy RMSE/Natoms : 4.243629e-03 eV
[2024-11-24 13:47:22,969] DEEPMD INFO    Force  MAE         : 5.249346e-01 eV/A
[2024-11-24 13:47:22,969] DEEPMD INFO    Force  RMSE        : 6.895287e-01 eV/A
[2024-11-24 13:47:22,969] DEEPMD INFO    Virial MAE         : 1.051727e+02 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Virial RMSE        : 1.802760e+02 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Virial MAE/Natoms  : 5.477746e-01 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Virial RMSE/Natoms : 9.389376e-01 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    # ----------------------------------------------- 
[2024-11-24 13:47:22,970] DEEPMD INFO    # ----------weighted average of errors----------- 
[2024-11-24 13:47:22,970] DEEPMD INFO    # number of systems : 4
[2024-11-24 13:47:22,970] DEEPMD INFO    Energy MAE         : 6.614471e-01 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Energy RMSE        : 8.277423e-01 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Energy MAE/Natoms  : 3.445037e-03 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Energy RMSE/Natoms : 4.311158e-03 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Force  MAE         : 5.225542e-01 eV/A
[2024-11-24 13:47:22,970] DEEPMD INFO    Force  RMSE        : 6.867169e-01 eV/A
[2024-11-24 13:47:22,970] DEEPMD INFO    Virial MAE         : 1.054252e+02 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Virial RMSE        : 1.805807e+02 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Virial MAE/Natoms  : 5.490895e-01 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    Virial RMSE/Natoms : 9.405246e-01 eV
[2024-11-24 13:47:22,970] DEEPMD INFO    # -----------------------------------------------

代码

文本

LAMMPS动力学模拟测速

代码

文本

虽然快速得到的模型几乎不可用于生产，但我们可以对不同后端的模型进行测速。

代码

文本

[27]

%cd ../lmp

/deepmd-kit/examples/water/lmp

代码

文本

这里，我们没有执行进行NVE或NVT等路径积分，因此模拟每一步的坐标将是一样的。对于每一个模型，我们先执行100步，以进行冷启动，再执行500步进行实际的测速。

代码

文本

[48]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

cat<<EOF > tf.in

units metal

boundary p p p

atom_style atomic

neighbor 0.0 bin

neigh_modify every 50 delay 0 check no

read_data water.lmp

mass 1 16

mass 2 2

replicate 4 4 4

pair_style deepmd ../se_atten_compressible/frozen_model.pb

pair_coeff * *

velocity all create 330.0 23456789

timestep 0.0005

thermo_style custom step pe ke etotal temp press vol

thermo 20

run 100

run 500

EOF

lmp -in tf.in

[bohrium-156-1225901:03558] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/3820224512/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:03558] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
LAMMPS (29 Aug 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.12
2024-11-24 14:24:14.762167: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 14:24:14.781782: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 14:24:14.787952: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp
Reading data file ...
  triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  192 atoms
  read_data CPU = 0.001 seconds
Replication is creating a 4x4x4 = 64 times larger system...
  triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  12288 atoms
  replicate CPU = 0.001 seconds
Summary of lammps deepmd module ...
  >>> Info of deepmd-kit:
  installed to:       /root/deepmd-kit
  source:             
  source branch:      HEAD
  source commit:      b1be266
  source commit at:   2024-11-23 01:37:55 -0800
  support model ver.: 1.1 
  build variant:      cuda
  build with tf inc:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include
  build with tf lib:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2
  build with pt lib:  torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so
  set tf intra_op_parallelism_threads: 0
  set tf inter_op_parallelism_threads: 0
  >>> Info of lammps module:
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-11-24 14:24:18.624530: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1732429458.625574    3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429458.627669    3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429458.627870    3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429458.628048    3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429458.628174    3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429458.628311    3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-11-24 14:24:18.628434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29250 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 14:24:18.663748: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
  use deepmd-kit at:  /root/deepmd-kit  >>> Info of model(s):
  using   1 model(s): ../se_atten_compressible/frozen_model.pb 
  rcut in model:      6
  ntypes in model:    2

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update: every = 50 steps, delay = 0 steps, check = no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 6
  binsize = 3, bins = 17 17 17
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair deepmd, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
         0  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
        20  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
        40  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
        60  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
        80  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       100  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
Loop time of 14.5451 on 1 procs for 100 steps with 12288 atoms

Performance: 0.297 ns/day, 80.806 hours/ns, 6.875 timesteps/s, 84.482 katom-step/s
46.5% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 14.464     | 14.464     | 14.464     |   0.0 | 99.45
Neigh   | 0.066192   | 0.066192   | 0.066192   |   0.0 |  0.46
Comm    | 0.011048   | 0.011048   | 0.011048   |   0.0 |  0.08
Output  | 0.00050548 | 0.00050548 | 0.00050548 |   0.0 |  0.00
Modify  | 7.5344e-05 | 7.5344e-05 | 7.5344e-05 |   0.0 |  0.00
Other   |            | 0.00289    |            |       |  0.02

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 2
Dangerous builds not checked
WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 100
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
       100  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       120  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       140  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       160  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       180  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       200  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       220  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       240  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       260  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       280  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       300  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       320  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       340  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       360  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       380  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       400  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       420  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       440  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       460  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       480  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       500  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       520  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       540  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       560  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       580  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       600  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
Loop time of 72.6601 on 1 procs for 500 steps with 12288 atoms

Performance: 0.297 ns/day, 80.733 hours/ns, 6.881 timesteps/s, 84.558 katom-step/s
46.4% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 72.258     | 72.258     | 72.258     |   0.0 | 99.45
Neigh   | 0.33104    | 0.33104    | 0.33104    |   0.0 |  0.46
Comm    | 0.054271   | 0.054271   | 0.054271   |   0.0 |  0.07
Output  | 0.0024261  | 0.0024261  | 0.0024261  |   0.0 |  0.00
Modify  | 0.00036951 | 0.00036951 | 0.00036951 |   0.0 |  0.00
Other   |            | 0.01433    |            |       |  0.02

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 10
Dangerous builds not checked
Total wall time: 0:01:35

代码

文本

[49]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

cat<<EOF > tfc.in

units metal

boundary p p p

atom_style atomic

neighbor 0.0 bin

neigh_modify every 50 delay 0 check no

read_data water.lmp

mass 1 16

mass 2 2

replicate 4 4 4

pair_style deepmd ../se_atten_compressible/frozen_model_compressed.pb

pair_coeff * *

velocity all create 330.0 23456789

timestep 0.0005

thermo_style custom step pe ke etotal temp press vol

thermo 20

run 100

run 500

EOF

lmp -in tfc.in

[bohrium-156-1225901:03665] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/4094951424/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:03665] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
LAMMPS (29 Aug 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.12
2024-11-24 14:25:52.964270: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 14:25:52.983780: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 14:25:52.989813: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp
Reading data file ...
  triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  192 atoms
  read_data CPU = 0.001 seconds
Replication is creating a 4x4x4 = 64 times larger system...
  triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  12288 atoms
  replicate CPU = 0.001 seconds
Summary of lammps deepmd module ...
  >>> Info of deepmd-kit:
  installed to:       /root/deepmd-kit
  source:             
  source branch:      HEAD
  source commit:      b1be266
  source commit at:   2024-11-23 01:37:55 -0800
  support model ver.: 1.1 
  build variant:      cuda
  build with tf inc:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include
  build with tf lib:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2
  build with pt lib:  torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so
  set tf intra_op_parallelism_threads: 0
  set tf inter_op_parallelism_threads: 0
  >>> Info of lammps module:
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update: every = 50 steps, delay = 0 steps, check = no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 6
  binsize = 3, bins = 17 17 17
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair deepmd, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
         0  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
        20  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
        40  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
        60  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
        80  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       100  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
Loop time of 3.94721 on 1 procs for 100 steps with 12288 atoms

Performance: 1.094 ns/day, 21.929 hours/ns, 25.334 timesteps/s, 311.309 katom-step/s
67.0% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 3.8685     | 3.8685     | 3.8685     |   0.0 | 98.01
Neigh   | 0.066096   | 0.066096   | 0.066096   |   0.0 |  1.67
Comm    | 0.0098926  | 0.0098926  | 0.0098926  |   0.0 |  0.25
Output  | 0.00047323 | 0.00047323 | 0.00047323 |   0.0 |  0.01
Modify  | 5.4836e-05 | 5.4836e-05 | 5.4836e-05 |   0.0 |  0.00
Other   |            | 0.00223    |            |       |  0.06

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 2
Dangerous builds not checked
WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 100
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
       100  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       120  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       140  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       160  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       180  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       200  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       220  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       240  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       260  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       280  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       300  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       320  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       340  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       360  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       380  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       400  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       420  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       440  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       460  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       480  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       500  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       520  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       540  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       560  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       580  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
       600  -1916365.1      524.1124      -1915841        330           -42650.41       123348.33    
Loop time of 19.6433 on 1 procs for 500 steps with 12288 atoms

Performance: 1.100 ns/day, 21.826 hours/ns, 25.454 timesteps/s, 312.779 katom-step/s
67.2% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 19.251     | 19.251     | 19.251     |   0.0 | 98.01
Neigh   | 0.33065    | 0.33065    | 0.33065    |   0.0 |  1.68
Comm    | 0.047609   | 0.047609   | 0.047609   |   0.0 |  0.24
Output  | 0.0023861  | 0.0023861  | 0.0023861  |   0.0 |  0.01
Modify  | 0.00031468 | 0.00031468 | 0.00031468 |   0.0 |  0.00
Other   |            | 0.01086    |            |       |  0.06

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 10
Dangerous builds not checked
Total wall time: 0:00:32

代码

文本

[50]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

cat<<EOF > pt.in

units metal

boundary p p p

atom_style atomic

neighbor 0.0 bin

neigh_modify every 50 delay 0 check no

read_data water.lmp

mass 1 16

mass 2 2

replicate 4 4 4

pair_style deepmd ../se_atten_compressible/frozen_model.pth

pair_coeff * *

velocity all create 330.0 23456789

timestep 0.0005

thermo_style custom step pe ke etotal temp press vol

thermo 20

run 100

run 500

EOF

lmp -in pt.in

[bohrium-156-1225901:03779] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/3188326400/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:03779] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
LAMMPS (29 Aug 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.12
2024-11-24 14:26:27.643570: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 14:26:27.663144: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 14:26:27.669205: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp
Reading data file ...
  triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  192 atoms
  read_data CPU = 0.001 seconds
Replication is creating a 4x4x4 = 64 times larger system...
  triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  12288 atoms
  replicate CPU = 0.001 seconds
Summary of lammps deepmd module ...
  >>> Info of deepmd-kit:
  installed to:       /root/deepmd-kit
  source:             
  source branch:      HEAD
  source commit:      b1be266
  source commit at:   2024-11-23 01:37:55 -0800
  support model ver.: 1.1 
  build variant:      cuda
  build with tf inc:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include
  build with tf lib:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2
  build with pt lib:  torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so
  set tf intra_op_parallelism_threads: 0
  set tf inter_op_parallelism_threads: 0
  >>> Info of lammps module:
  use deepmd-kit at:  /root/deepmd-kitload model from: ../se_atten_compressible/frozen_model.pth to gpu 0
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
  >>> Info of model(s):
  using   1 model(s): ../se_atten_compressible/frozen_model.pth 
  rcut in model:      6
  ntypes in model:    2

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update: every = 50 steps, delay = 0 steps, check = no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 6
  binsize = 3, bins = 17 17 17
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair deepmd, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
         0  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        20  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        40  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        60  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        80  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       100  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
Loop time of 15.9732 on 1 procs for 100 steps with 12288 atoms

Performance: 0.270 ns/day, 88.740 hours/ns, 6.260 timesteps/s, 76.929 katom-step/s
64.4% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 15.891     | 15.891     | 15.891     |   0.0 | 99.49
Neigh   | 0.065974   | 0.065974   | 0.065974   |   0.0 |  0.41
Comm    | 0.011833   | 0.011833   | 0.011833   |   0.0 |  0.07
Output  | 0.0005018  | 0.0005018  | 0.0005018  |   0.0 |  0.00
Modify  | 8.3108e-05 | 8.3108e-05 | 8.3108e-05 |   0.0 |  0.00
Other   |            | 0.003758   |            |       |  0.02

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 2
Dangerous builds not checked
WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 100
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
       100  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       120  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       140  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       160  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       180  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       200  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       220  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       240  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       260  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       280  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       300  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       320  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       340  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       360  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       380  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       400  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       420  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       440  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       460  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       480  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       500  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       520  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       540  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       560  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       580  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       600  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
Loop time of 70.5811 on 1 procs for 500 steps with 12288 atoms

Performance: 0.306 ns/day, 78.423 hours/ns, 7.084 timesteps/s, 87.049 katom-step/s
58.3% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 70.171     | 70.171     | 70.171     |   0.0 | 99.42
Neigh   | 0.32965    | 0.32965    | 0.32965    |   0.0 |  0.47
Comm    | 0.058627   | 0.058627   | 0.058627   |   0.0 |  0.08
Output  | 0.0024118  | 0.0024118  | 0.0024118  |   0.0 |  0.00
Modify  | 0.00041647 | 0.00041647 | 0.00041647 |   0.0 |  0.00
Other   |            | 0.0185     |            |       |  0.03

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 10
Dangerous builds not checked
Total wall time: 0:01:33

代码

文本

[51]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

cat<<EOF > ptc.in

units metal

boundary p p p

atom_style atomic

neighbor 0.0 bin

neigh_modify every 50 delay 0 check no

read_data water.lmp

mass 1 16

mass 2 2

replicate 4 4 4

pair_style deepmd ../se_atten_compressible/frozen_model_compressed.pth

pair_coeff * *

velocity all create 330.0 23456789

timestep 0.0005

thermo_style custom step pe ke etotal temp press vol

thermo 20

run 100

run 500

EOF

lmp -in ptc.in

[bohrium-156-1225901:03792] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/643170304/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:03792] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
LAMMPS (29 Aug 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.12
2024-11-24 14:28:03.862375: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 14:28:03.881941: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 14:28:03.887974: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp
Reading data file ...
  triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  192 atoms
  read_data CPU = 0.001 seconds
Replication is creating a 4x4x4 = 64 times larger system...
  triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  12288 atoms
  replicate CPU = 0.001 seconds
Summary of lammps deepmd module ...
  >>> Info of deepmd-kit:
  installed to:       /root/deepmd-kit
  source:             
  source branch:      HEAD
  source commit:      b1be266
  source commit at:   2024-11-23 01:37:55 -0800
  support model ver.: 1.1 
  build variant:      cuda
  build with tf inc:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include
  build with tf lib:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2
  build with pt lib:  torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so
  set tf intra_op_parallelism_threads: 0
  set tf inter_op_parallelism_threads: 0
  >>> Info of lammps module:
  use deepmd-kit at:  /root/deepmd-kitload model from: ../se_atten_compressible/frozen_model_compressed.pth to gpu 0
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
  >>> Info of model(s):
  using   1 model(s): ../se_atten_compressible/frozen_model_compressed.pth 
  rcut in model:      6
  ntypes in model:    2

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update: every = 50 steps, delay = 0 steps, check = no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 6
  binsize = 3, bins = 17 17 17
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair deepmd, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
         0  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        20  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        40  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        60  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        80  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       100  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
Loop time of 6.43413 on 1 procs for 100 steps with 12288 atoms

Performance: 0.671 ns/day, 35.745 hours/ns, 15.542 timesteps/s, 190.982 katom-step/s
73.6% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 6.3528     | 6.3528     | 6.3528     |   0.0 | 98.74
Neigh   | 0.065845   | 0.065845   | 0.065845   |   0.0 |  1.02
Comm    | 0.011392   | 0.011392   | 0.011392   |   0.0 |  0.18
Output  | 0.00051878 | 0.00051878 | 0.00051878 |   0.0 |  0.01
Modify  | 6.772e-05  | 6.772e-05  | 6.772e-05  |   0.0 |  0.00
Other   |            | 0.003543   |            |       |  0.06

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 2
Dangerous builds not checked
WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 100
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
       100  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       120  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       140  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       160  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       180  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       200  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       220  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       240  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       260  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       280  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       300  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       320  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       340  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       360  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       380  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       400  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       420  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       440  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       460  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       480  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       500  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       520  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       540  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       560  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       580  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       600  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
Loop time of 23.764 on 1 procs for 500 steps with 12288 atoms

Performance: 0.909 ns/day, 26.404 hours/ns, 21.040 timesteps/s, 258.542 katom-step/s
65.9% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 23.358     | 23.358     | 23.358     |   0.0 | 98.29
Neigh   | 0.33099    | 0.33099    | 0.33099    |   0.0 |  1.39
Comm    | 0.055314   | 0.055314   | 0.055314   |   0.0 |  0.23
Output  | 0.0024087  | 0.0024087  | 0.0024087  |   0.0 |  0.01
Modify  | 0.00037167 | 0.00037167 | 0.00037167 |   0.0 |  0.00
Other   |            | 0.01714    |            |       |  0.07

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 10
Dangerous builds not checked
Total wall time: 0:00:37

代码

文本

[47]

%%bash

source /root/deepmd-kit/bin/activate /root/deepmd-kit

cat<<EOF > jax.in

units metal

boundary p p p

atom_style atomic

neighbor 0.0 bin

neigh_modify every 50 delay 0 check no

read_data water.lmp

mass 1 16

mass 2 2

replicate 4 4 4

pair_style deepmd ../se_atten_compressible/frozen_model.savedmodel

pair_coeff * *

velocity all create 330.0 23456789

timestep 0.0005

thermo_style custom step pe ke etotal temp press vol

thermo 20

run 100

run 500

EOF

lmp -in jax.in

[bohrium-156-1225901:03478] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/3465609216/shared_mem_cuda_pool.bohrium-156-1225901 could be created.
[bohrium-156-1225901:03478] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
LAMMPS (29 Aug 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.12
2024-11-24 14:22:17.578334: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-24 14:22:17.597960: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-24 14:22:17.604057: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp
Reading data file ...
  triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  192 atoms
  read_data CPU = 0.001 seconds
Replication is creating a 4x4x4 = 64 times larger system...
  triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  12288 atoms
  replicate CPU = 0.001 seconds
Summary of lammps deepmd module ...
  >>> Info of deepmd-kit:
  installed to:       /root/deepmd-kit
  source:             
  source branch:      HEAD
  source commit:      b1be266
  source commit at:   2024-11-23 01:37:55 -0800
  support model ver.: 1.1 
  build variant:      cuda
  build with tf inc:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include
  build with tf lib:  /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2
  build with pt lib:  torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so
  set tf intra_op_parallelism_threads: 0
  set tf inter_op_parallelism_threads: 0
  >>> Info of lammps module:
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-11-24 14:22:17.636243: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: ../se_atten_compressible/frozen_model.savedmodel
2024-11-24 14:22:17.669163: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2024-11-24 14:22:17.669196: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: ../se_atten_compressible/frozen_model.savedmodel
2024-11-24 14:22:17.669257: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1732429337.670136    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429337.672073    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429337.672261    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429341.772416    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429341.772625    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429341.772778    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-11-24 14:22:21.772908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29250 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
2024-11-24 14:22:21.918753: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-11-24 14:22:21.923572: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2024-11-24 14:22:22.005609: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: ../se_atten_compressible/frozen_model.savedmodel
2024-11-24 14:22:22.077173: I tensorflow/cc/saved_model/loader.cc:462] SavedModel load for tags { serve }; Status: success: OK. Took 4440933 microseconds.
I0000 00:00:1732429342.134970    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429342.135203    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429342.135324    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429342.135489    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1732429342.135609    3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-11-24 14:22:22.135746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29250 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0
  use deepmd-kit at:  /root/deepmd-kit  >>> Info of model(s):
  using   1 model(s): ../se_atten_compressible/frozen_model.savedmodel 
  rcut in model:      6
  ntypes in model:    2
I0000 00:00:1732429342.378001    3521 service.cc:146] XLA service 0x7fd5f00039c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1732429342.378042    3521 service.cc:154]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2024-11-24 14:22:22.527898: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-11-24 14:22:22.576620: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 90300
I0000 00:00:1732429346.413280    3521 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update: every = 50 steps, delay = 0 steps, check = no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 6
  binsize = 3, bins = 17 17 17
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair deepmd, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
         0  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        20  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        40  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        60  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
        80  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       100  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
Loop time of 13.9058 on 1 procs for 100 steps with 12288 atoms

Performance: 0.311 ns/day, 77.254 hours/ns, 7.191 timesteps/s, 88.366 katom-step/s
66.0% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 13.821     | 13.821     | 13.821     |   0.0 | 99.39
Neigh   | 0.065975   | 0.065975   | 0.065975   |   0.0 |  0.47
Comm    | 0.014234   | 0.014234   | 0.014234   |   0.0 |  0.10
Output  | 0.00047766 | 0.00047766 | 0.00047766 |   0.0 |  0.00
Modify  | 0.00011797 | 0.00011797 | 0.00011797 |   0.0 |  0.00
Other   |            | 0.004056   |            |       |  0.03

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 2
Dangerous builds not checked
WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60)
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 100
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes
   Step         PotEng         KinEng         TotEng          Temp          Press          Volume    
       100  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       120  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       140  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       160  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       180  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       200  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       220  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       240  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       260  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       280  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       300  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       320  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       340  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       360  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       380  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       400  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       420  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       440  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       460  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       480  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       500  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       520  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       540  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       560  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       580  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
       600  -1916323        524.1124      -1915798.8      330           -253197.37      123348.33    
Loop time of 69.3794 on 1 procs for 500 steps with 12288 atoms

Performance: 0.311 ns/day, 77.088 hours/ns, 7.207 timesteps/s, 88.557 katom-step/s
65.7% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 68.955     | 68.955     | 68.955     |   0.0 | 99.39
Neigh   | 0.33016    | 0.33016    | 0.33016    |   0.0 |  0.48
Comm    | 0.07073    | 0.07073    | 0.07073    |   0.0 |  0.10
Output  | 0.0024476  | 0.0024476  | 0.0024476  |   0.0 |  0.00
Modify  | 0.0005643  | 0.0005643  | 0.0005643  |   0.0 |  0.00
Other   |            | 0.02045    |            |       |  0.03

Nlocal:          12288 ave       12288 max       12288 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:          11142 ave       11142 max       11142 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1087488
Ave neighs/atom = 88.5
Neighbor list builds = 10
Dangerous builds not checked
Total wall time: 0:01:33

代码

文本

可以看到，不同后端对压缩前和压缩后的模型给出了非常相似的结果。需要注意的是，不同模型、不同体系可能千差万别，此结果不一定对所有情况均适用。

代码

文本

DeePMD-kit

中文

DeePMD-kit中文

点个赞吧