新建
【DeePMD-kit v3教程1】多后端框架·使用教程


更新于 2024-11-24
推荐镜像 :Basic Image:ubuntu22.04-py3.10
推荐机型 :c12_m92_1 * NVIDIA V100
赞
目录
安装
代码
文本
[1]
!nvidia-smi
Sun Nov 24 13:02:01 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:00:09.0 Off | 0 | | N/A 33C P0 36W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
代码
文本
这里我们用一个V100节点进行演示,并安装GPU环境的DeePMD-kit。
代码
文本
[2]
!wget https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0/deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0 -O deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0
!wget https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0/deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1 -O deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1
!cat deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0 deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1 > deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh
--2024-11-24 13:02:28-- https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0/deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0 Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.18, 10.255.254.7, 10.255.254.37 Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected. Proxy request sent, awaiting response... 302 Found Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/114006193/4858a092-c590-4b8f-a9d9-a3ee36f1e2eb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241124T050228Z&X-Amz-Expires=300&X-Amz-Signature=a5676bb7269222f4c3e1ac0f25ac57d4a7d74e8846a3eeda41f5b2b7777f537b&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Ddeepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0&response-content-type=application%2Foctet-stream [following] --2024-11-24 13:02:28-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/114006193/4858a092-c590-4b8f-a9d9-a3ee36f1e2eb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241124T050228Z&X-Amz-Expires=300&X-Amz-Signature=a5676bb7269222f4c3e1ac0f25ac57d4a7d74e8846a3eeda41f5b2b7777f537b&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Ddeepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0&response-content-type=application%2Foctet-stream Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 1593874464 (1.5G) [application/octet-stream] Saving to: ‘deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0’ deepmd-kit-3.0.0-cu 100%[===================>] 1.48G 3.50MB/s in 6m 38s 2024-11-24 13:09:08 (3.82 MB/s) - ‘deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.0’ saved [1593874464/1593874464] --2024-11-24 13:09:08-- https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0/deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1 Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.7, 10.255.254.18 Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected. Proxy request sent, awaiting response... 302 Found Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/114006193/ab02be0e-f4b9-44db-a593-bfcc82fb194b?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241124T050909Z&X-Amz-Expires=300&X-Amz-Signature=d1b36bf592d4fa6719d18177f8136d42dd05f4a454fc730ce9ed337c756bedad&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Ddeepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1&response-content-type=application%2Foctet-stream [following] --2024-11-24 13:09:09-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/114006193/ab02be0e-f4b9-44db-a593-bfcc82fb194b?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241124T050909Z&X-Amz-Expires=300&X-Amz-Signature=d1b36bf592d4fa6719d18177f8136d42dd05f4a454fc730ce9ed337c756bedad&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Ddeepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1&response-content-type=application%2Foctet-stream Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 1593874465 (1.5G) [application/octet-stream] Saving to: ‘deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1’ deepmd-kit-3.0.0-cu 100%[===================>] 1.48G 5.36MB/s in 5m 53s 2024-11-24 13:15:04 (4.30 MB/s) - ‘deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh.1’ saved [1593874465/1593874465]
代码
文本
[3]
!sh deepmd-kit-3.0.0-cuda126-Linux-x86_64.sh -b
PREFIX=/root/deepmd-kit Unpacking payload ... Notes: The off-line packages and conda packages require the GNU C Library 2.17 or above[1]. The GPU version requires compatible NVIDIA driver to be installed in advance[2]. It is possible to force conda to override detection when installation[3] (such as CONDA_OVERRIDE_CUDA), but these requirements are still necessary during runtime. [1] The GNU C Library. https://www.gnu.org/software/libc/ [2] Minor Version Compatibility. NVIDIA Data Center GPU Driver Documentation. https://docs.nvidia.com/deploy/cuda-compatibility/index.html#minor-version-compatibility [3] Overriding detected packages. conda documentation. https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html#overriding-detected-packages Installing base environment... Preparing transaction: ...working... done Executing transaction: ...working... By downloading and using the cuDNN conda packages, you accept the terms and conditions of the NVIDIA cuDNN EULA - https://docs.nvidia.com/deeplearning/cudnn/sla/index.html To enable CUDA support, UCX requires the CUDA Runtime library (libcudart). The library can be installed with the appropriate command below: * For CUDA 11, run: conda install cudatoolkit cuda-version=11 * For CUDA 12, run: conda install cuda-cudart cuda-version=12 To enable CUDA support, please follow UCX's instruction above. To additionally enable NCCL support, run: conda install nccl On Linux, Open MPI is built with CUDA awareness but it is disabled by default. To enable it, please set the environment variable OMPI_MCA_opal_cuda_support=true before launching your MPI processes. Equivalently, you can set the MCA parameter in the command line: mpiexec --mca opal_cuda_support 1 ... Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX documentation for further details. done Please activate the environment before using the packages: source /path/to/deepmd-kit/bin/activate /path/to/deepmd-kit This package enables TensorFlow, PyTorch, and JAX backends. The following executable files have been installed: 1. DeePMD-kit CLi: dp -h 2. LAMMPS: lmp -h 3. DeePMD-kit i-Pi interface: dp_ipi 4. MPICH: mpirun -h 5. Horovod: horovod -h The following Python libraries have been installed: 1. deepmd 2. dpdata 3. pylammps If you have any questions, seek help from https://github.com/deepmodeling/deepmd-kit/discussions installation finished.
代码
文本
接下来的代码将使用
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
作为开头激活环境。
代码
文本
这里修复libdevice not found at ./libdevice.10.bc
的报错。
代码
文本
[10]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
pip install nvidia-cuda-nvcc-cu12
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting nvidia-cuda-nvcc-cu12 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/25/1f/faf9b791027ebd6354be68700da3c3d8a3b3db3bdcf2f8070f2e6871a7f1/nvidia_cuda_nvcc_cu12-12.6.85-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (21.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.2/21.2 MB 29.0 MB/s eta 0:00:00a 0:00:01 Installing collected packages: nvidia-cuda-nvcc-cu12 Successfully installed nvidia-cuda-nvcc-cu12-12.6.85 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
代码
文本
[4]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp -h
usage: dp [-h] [-b {tensorflow,tf,jax,pytorch,pt} | --tensorflow | --jax | --pytorch] [--version] {transfer,train,freeze,test,compress,doc-train-input,model-devi,convert-from,neighbor-stat,change-bias,train-nvnmd,gui,convert-backend,show} ... DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics options: -h, --help show this help message and exit -b {tensorflow,tf,jax,pytorch,pt}, --backend {tensorflow,tf,jax,pytorch,pt} The backend of the model. Default can be set by environment variable DP_BACKEND. (default: tensorflow) --tensorflow, --tf Alias for --backend tensorflow (default: None) --jax Alias for --backend jax (default: None) --pytorch, --pt Alias for --backend pytorch (default: None) --version show program's version number and exit Valid subcommands: {transfer,train,freeze,test,compress,doc-train-input,model-devi,convert-from,neighbor-stat,change-bias,train-nvnmd,gui,convert-backend,show} transfer (Supported backend: TensorFlow) pass parameters to another model train train a model freeze freeze the model
test test the model compress Compress a model doc-train-input print the documentation (in rst format) of input training parameters. model-devi calculate model deviation convert-from (Supported backend: TensorFlow) convert lower model version to supported version neighbor-stat Calculate neighbor statistics change-bias (Supported backend: PyTorch) Change model out bias according to the input data. train-nvnmd (Supported backend: TensorFlow) train nvnmd model gui Serve DP-GUI. convert-backend Convert model to another backend. show Show the information of a model Use --tf or --pt to choose the backend: dp --tf train input.json dp --pt train input.json
代码
文本
多后端训练/冻结/压缩
代码
文本
我们使用DeePMD-kit的se_atten_compressible
例子作为示范,并将训练步数改为1000。
代码
文本
[5]
!git clone https://github.com/deepmodeling/deepmd-kit
Cloning into 'deepmd-kit'... remote: Enumerating objects: 36456, done. remote: Counting objects: 100% (1438/1438), done. remote: Compressing objects: 100% (1050/1050), done. remote: Total 36456 (delta 757), reused 811 (delta 384), pack-reused 35018 (from 1) Receiving objects: 100% (36456/36456), 63.99 MiB | 5.33 MiB/s, done. Resolving deltas: 100% (27045/27045), done.
代码
文本
[6]
%cd deepmd-kit/examples/water/se_atten_compressible
/deepmd-kit/examples/water/se_atten_compressible
代码
文本
[7]
!sed -i "s/1000000/1000/g" input.json
代码
文本
TensorFlow
代码
文本
[11]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp --tf train input.json
2024-11-24 13:37:00.784789: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 13:37:00.802137: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 13:37:00.807370: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-24 13:37:00.819907: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [bohrium-156-1225901:00353] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/1785462784/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:00353] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 [2024-11-24 13:37:09,606] DEEPMD INFO Calculate neighbor statistics... (add --skip-neighbor-stat to skip this step) [2024-11-24 13:37:09,680] DEEPMD INFO If you encounter the error 'an illegal memory access was encountered', this may be due to a TensorFlow issue. To avoid this, set the environment variable DP_INFER_BATCH_SIZE to a smaller value than the last adjusted batch size. The environment variable DP_INFER_BATCH_SIZE controls the inference batch size (nframes * natoms). 2024-11-24 13:37:10.921907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:37:10.928730: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled 2024-11-24 13:37:11.005802: I tensorflow/core/util/cuda_solvers.cc:178] Creating GpuSolver handles for stream 0x55d7b60649a0 [2024-11-24 13:37:13,446] DEEPMD INFO Adjust batch size from 1024 to 2048 [2024-11-24 13:37:13,492] DEEPMD INFO Adjust batch size from 2048 to 4096 [2024-11-24 13:37:13,599] DEEPMD INFO Adjust batch size from 4096 to 8192 [2024-11-24 13:37:13,791] DEEPMD INFO Adjust batch size from 8192 to 16384 [2024-11-24 13:37:14,372] DEEPMD INFO training data with min nbor dist: 0.8854385688525499 [2024-11-24 13:37:14,373] DEEPMD INFO training data with max nbor size: [108] [2024-11-24 13:37:14,409] DEEPMD INFO _____ _____ __ __ _____ _ _ _ [2024-11-24 13:37:14,409] DEEPMD INFO | __ \ | __ \ | \/ || __ \ | | (_)| | [2024-11-24 13:37:14,409] DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |_ [2024-11-24 13:37:14,409] DEEPMD INFO | | | | / _ \ / _ \| ___/ | |\/| || | | ||______|| |/ /| || __| [2024-11-24 13:37:14,409] DEEPMD INFO | |__| || __/| __/| | | | | || |__| | | < | || |_ [2024-11-24 13:37:14,409] DEEPMD INFO |_____/ \___| \___||_| |_| |_||_____/ |_|\_\|_| \__| [2024-11-24 13:37:14,409] DEEPMD INFO Please read and cite: [2024-11-24 13:37:14,409] DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018) [2024-11-24 13:37:14,409] DEEPMD INFO Zeng et al, J. Chem. Phys., 159, 054801 (2023) [2024-11-24 13:37:14,409] DEEPMD INFO See https://deepmd.rtfd.io/credits/ for details. [2024-11-24 13:37:14,409] DEEPMD INFO ---------------------------------------------------------------------------------------- [2024-11-24 13:37:14,409] DEEPMD INFO installed to: /root/deepmd-kit/lib/python3.12/site-packages/deepmd [2024-11-24 13:37:14,409] DEEPMD INFO source: [2024-11-24 13:37:14,409] DEEPMD INFO source branch: HEAD [2024-11-24 13:37:14,409] DEEPMD INFO source commit: b1be266 [2024-11-24 13:37:14,409] DEEPMD INFO source commit at: 2024-11-23 01:37:55 -0800 [2024-11-24 13:37:14,410] DEEPMD INFO use float prec: double [2024-11-24 13:37:14,410] DEEPMD INFO build variant: cuda [2024-11-24 13:37:14,410] DEEPMD INFO Backend: TensorFlow [2024-11-24 13:37:14,410] DEEPMD INFO TF ver: unknown [2024-11-24 13:37:14,410] DEEPMD INFO build with TF ver: 2.17.0 [2024-11-24 13:37:14,410] DEEPMD INFO build with TF inc: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include/ [2024-11-24 13:37:14,410] DEEPMD INFO /root/deepmd-kit/include [2024-11-24 13:37:14,410] DEEPMD INFO build with TF lib: [2024-11-24 13:37:14,410] DEEPMD INFO running on: bohrium-156-1225901 [2024-11-24 13:37:14,410] DEEPMD INFO computing device: gpu:0 [2024-11-24 13:37:14,410] DEEPMD INFO CUDA_VISIBLE_DEVICES: unset [2024-11-24 13:37:14,410] DEEPMD INFO Count of visible GPUs: 1 [2024-11-24 13:37:14,410] DEEPMD INFO num_intra_threads: 0 [2024-11-24 13:37:14,410] DEEPMD INFO num_inter_threads: 0 [2024-11-24 13:37:14,410] DEEPMD INFO ---------------------------------------------------------------------------------------- 2024-11-24 13:37:14.414808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:37:14.418459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:37:14,427] DEEPMD INFO ---Summary of DataSystem: training ----------------------------------------------- [2024-11-24 13:37:14,427] DEEPMD INFO found 3 system(s): [2024-11-24 13:37:14,427] DEEPMD INFO system natoms bch_sz n_bch prob pbc [2024-11-24 13:37:14,427] DEEPMD INFO ../data/data_0/ 192 1 80 2.500e-01 T [2024-11-24 13:37:14,427] DEEPMD INFO ../data/data_1/ 192 1 160 5.000e-01 T [2024-11-24 13:37:14,427] DEEPMD INFO ../data/data_2/ 192 1 80 2.500e-01 T [2024-11-24 13:37:14,427] DEEPMD INFO -------------------------------------------------------------------------------------- [2024-11-24 13:37:14,430] DEEPMD INFO ---Summary of DataSystem: validation ----------------------------------------------- [2024-11-24 13:37:14,430] DEEPMD INFO found 1 system(s): [2024-11-24 13:37:14,430] DEEPMD INFO system natoms bch_sz n_bch prob pbc [2024-11-24 13:37:14,430] DEEPMD INFO ../data/data_3 192 1 80 1.000e+00 T [2024-11-24 13:37:14,430] DEEPMD INFO -------------------------------------------------------------------------------------- [2024-11-24 13:37:14,430] DEEPMD INFO training without frame parameter [2024-11-24 13:37:14,430] DEEPMD INFO data stating... (this step may take long time) [2024-11-24 13:37:14,517] DEEPMD INFO built lr [2024-11-24 13:37:14,606] DEEPMD INFO use the compressible model with stripped type embedding [2024-11-24 13:37:15,184] DEEPMD INFO built network [2024-11-24 13:37:16,166] DEEPMD INFO built training [2024-11-24 13:37:16,167] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. 2024-11-24 13:37:16.169002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:37:16,210] DEEPMD INFO initialize model from scratch [2024-11-24 13:37:16,905] DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 11, decay_rate 0.893302, final lr will be 3.89e-08 [2024-11-24 13:37:19,189] DEEPMD INFO batch 0: trn: rmse = 2.61e+01, rmse_e = 1.66e-01, rmse_f = 8.24e-01, lr = 1.00e-03 [2024-11-24 13:37:19,189] DEEPMD INFO batch 0: val: rmse = 2.60e+01, rmse_e = 1.67e-01, rmse_f = 8.22e-01 2024-11-24 13:37:19.698894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:37:21,003] DEEPMD INFO batch 100: trn: rmse = 1.51e+01, rmse_e = 1.23e-03, rmse_f = 7.94e-01, lr = 3.62e-04 [2024-11-24 13:37:21,003] DEEPMD INFO batch 100: val: rmse = 1.55e+01, rmse_e = 2.31e-03, rmse_f = 8.13e-01 [2024-11-24 13:37:21,003] DEEPMD INFO batch 100: total wall time = 4.10 s [2024-11-24 13:37:22,227] DEEPMD INFO batch 200: trn: rmse = 9.49e+00, rmse_e = 4.69e-03, rmse_f = 8.25e-01, lr = 1.31e-04 [2024-11-24 13:37:22,227] DEEPMD INFO batch 200: val: rmse = 8.65e+00, rmse_e = 3.92e-03, rmse_f = 7.53e-01 [2024-11-24 13:37:22,227] DEEPMD INFO batch 200: total wall time = 1.22 s [2024-11-24 13:37:23,459] DEEPMD INFO batch 300: trn: rmse = 5.62e+00, rmse_e = 3.26e-03, rmse_f = 8.07e-01, lr = 4.75e-05 [2024-11-24 13:37:23,459] DEEPMD INFO batch 300: val: rmse = 5.55e+00, rmse_e = 7.08e-03, rmse_f = 7.96e-01 [2024-11-24 13:37:23,459] DEEPMD INFO batch 300: total wall time = 1.23 s [2024-11-24 13:37:24,683] DEEPMD INFO batch 400: trn: rmse = 3.62e+00, rmse_e = 2.03e-03, rmse_f = 8.48e-01, lr = 1.72e-05 [2024-11-24 13:37:24,684] DEEPMD INFO batch 400: val: rmse = 3.45e+00, rmse_e = 1.08e-03, rmse_f = 8.08e-01 [2024-11-24 13:37:24,684] DEEPMD INFO batch 400: total wall time = 1.22 s [2024-11-24 13:37:25,911] DEEPMD INFO batch 500: trn: rmse = 1.99e+00, rmse_e = 1.98e-03, rmse_f = 7.40e-01, lr = 6.24e-06 [2024-11-24 13:37:25,911] DEEPMD INFO batch 500: val: rmse = 1.98e+00, rmse_e = 3.81e-03, rmse_f = 7.36e-01 [2024-11-24 13:37:25,911] DEEPMD INFO batch 500: total wall time = 1.23 s [2024-11-24 13:37:27,140] DEEPMD INFO batch 600: trn: rmse = 1.47e+00, rmse_e = 2.12e-03, rmse_f = 8.15e-01, lr = 2.26e-06 [2024-11-24 13:37:27,140] DEEPMD INFO batch 600: val: rmse = 1.44e+00, rmse_e = 2.21e-03, rmse_f = 7.95e-01 [2024-11-24 13:37:27,140] DEEPMD INFO batch 600: total wall time = 1.23 s [2024-11-24 13:37:28,372] DEEPMD INFO batch 700: trn: rmse = 9.39e-01, rmse_e = 3.48e-04, rmse_f = 6.96e-01, lr = 8.18e-07 [2024-11-24 13:37:28,372] DEEPMD INFO batch 700: val: rmse = 1.03e+00, rmse_e = 2.61e-03, rmse_f = 7.61e-01 [2024-11-24 13:37:28,372] DEEPMD INFO batch 700: total wall time = 1.23 s [2024-11-24 13:37:29,602] DEEPMD INFO batch 800: trn: rmse = 9.69e-01, rmse_e = 5.29e-03, rmse_f = 8.49e-01, lr = 2.96e-07 [2024-11-24 13:37:29,602] DEEPMD INFO batch 800: val: rmse = 9.12e-01, rmse_e = 2.68e-03, rmse_f = 8.00e-01 [2024-11-24 13:37:29,602] DEEPMD INFO batch 800: total wall time = 1.23 s [2024-11-24 13:37:30,830] DEEPMD INFO batch 900: trn: rmse = 8.57e-01, rmse_e = 7.61e-03, rmse_f = 8.09e-01, lr = 1.07e-07 [2024-11-24 13:37:30,830] DEEPMD INFO batch 900: val: rmse = 8.64e-01, rmse_e = 3.19e-03, rmse_f = 8.19e-01 [2024-11-24 13:37:30,830] DEEPMD INFO batch 900: total wall time = 1.23 s [2024-11-24 13:37:32,057] DEEPMD INFO batch 1000: trn: rmse = 7.97e-01, rmse_e = 3.67e-03, rmse_f = 7.80e-01, lr = 3.89e-08 [2024-11-24 13:37:32,057] DEEPMD INFO batch 1000: val: rmse = 8.20e-01, rmse_e = 3.26e-03, rmse_f = 8.03e-01 [2024-11-24 13:37:32,057] DEEPMD INFO batch 1000: total wall time = 1.23 s [2024-11-24 13:37:32,209] DEEPMD INFO saved checkpoint model.ckpt [2024-11-24 13:37:32,209] DEEPMD INFO average training time: 0.0119 s/batch (exclude first 100 batches) [2024-11-24 13:37:32,209] DEEPMD INFO finished training [2024-11-24 13:37:32,209] DEEPMD INFO wall time: 16.042 s
代码
文本
[12]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp --tf freeze
2024-11-24 13:38:05.540648: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 13:38:05.558121: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 13:38:05.563439: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-24 13:38:05.575998: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [bohrium-156-1225901:00626] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/1823997952/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:00626] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 2024-11-24 13:38:11.055650: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. 2024-11-24 13:38:11.055917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30908 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:38:11.094493: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled [2024-11-24 13:38:11,246] DEEPMD INFO The following nodes will be frozen: ['fitting_attr/daparam', 'o_force', 'model_attr/model_version', 'model_attr/tmap', 'fitting_attr/dfparam', 'model_attr/model_type', 'descrpt_attr/rcut', 't_mesh', 'train_attr/min_nbor_dist', 'model_type', 'train_attr/training_script', 'o_energy', 'o_atom_energy', 'descrpt_attr/ntypes', 'o_virial', 'o_atom_virial'] [2024-11-24 13:38:11,517] DEEPMD INFO 782 ops in the final graph.
代码
文本
[16]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp --tf compress
2024-11-24 13:41:09.288551: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 13:41:09.305736: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 13:41:09.310908: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-24 13:41:09.323223: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. 2024-11-24 13:41:13.308673: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. 2024-11-24 13:41:13.308915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:13.334481: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled 2024-11-24 13:41:13.357394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:13.385652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:41:13,459] DEEPMD INFO [2024-11-24 13:41:13,459] DEEPMD INFO stage 1: compress the model [bohrium-156-1225901:00959] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/1874722816/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:00959] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 [2024-11-24 13:41:19,458] DEEPMD INFO _____ _____ __ __ _____ _ _ _ [2024-11-24 13:41:19,458] DEEPMD INFO | __ \ | __ \ | \/ || __ \ | | (_)| | [2024-11-24 13:41:19,458] DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |_ [2024-11-24 13:41:19,458] DEEPMD INFO | | | | / _ \ / _ \| ___/ | |\/| || | | ||______|| |/ /| || __| [2024-11-24 13:41:19,458] DEEPMD INFO | |__| || __/| __/| | | | | || |__| | | < | || |_ [2024-11-24 13:41:19,458] DEEPMD INFO |_____/ \___| \___||_| |_| |_||_____/ |_|\_\|_| \__| [2024-11-24 13:41:19,458] DEEPMD INFO Please read and cite: [2024-11-24 13:41:19,458] DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018) [2024-11-24 13:41:19,458] DEEPMD INFO Zeng et al, J. Chem. Phys., 159, 054801 (2023) [2024-11-24 13:41:19,458] DEEPMD INFO See https://deepmd.rtfd.io/credits/ for details. [2024-11-24 13:41:19,458] DEEPMD INFO ---------------------------------------------------------------------------------------- [2024-11-24 13:41:19,458] DEEPMD INFO installed to: /root/deepmd-kit/lib/python3.12/site-packages/deepmd [2024-11-24 13:41:19,458] DEEPMD INFO source: [2024-11-24 13:41:19,458] DEEPMD INFO source branch: HEAD [2024-11-24 13:41:19,458] DEEPMD INFO source commit: b1be266 [2024-11-24 13:41:19,458] DEEPMD INFO source commit at: 2024-11-23 01:37:55 -0800 [2024-11-24 13:41:19,458] DEEPMD INFO use float prec: double [2024-11-24 13:41:19,458] DEEPMD INFO build variant: cuda [2024-11-24 13:41:19,458] DEEPMD INFO Backend: TensorFlow [2024-11-24 13:41:19,458] DEEPMD INFO TF ver: unknown [2024-11-24 13:41:19,458] DEEPMD INFO build with TF ver: 2.17.0 [2024-11-24 13:41:19,458] DEEPMD INFO build with TF inc: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include/ [2024-11-24 13:41:19,459] DEEPMD INFO /root/deepmd-kit/include [2024-11-24 13:41:19,459] DEEPMD INFO build with TF lib: [2024-11-24 13:41:19,459] DEEPMD INFO running on: bohrium-156-1225901 [2024-11-24 13:41:19,459] DEEPMD INFO computing device: gpu:0 [2024-11-24 13:41:19,459] DEEPMD INFO CUDA_VISIBLE_DEVICES: unset [2024-11-24 13:41:19,459] DEEPMD INFO Count of visible GPUs: 1 [2024-11-24 13:41:19,459] DEEPMD INFO num_intra_threads: 0 [2024-11-24 13:41:19,459] DEEPMD INFO num_inter_threads: 0 [2024-11-24 13:41:19,459] DEEPMD INFO ---------------------------------------------------------------------------------------- 2024-11-24 13:41:19.465275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:19.468901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:41:19,469] DEEPMD INFO training without frame parameter 2024-11-24 13:41:19.537218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:19.538325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:19.561792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:41:19,587] DEEPMD INFO training data with lower boundary: [-0.3899236 -0.42140616] [2024-11-24 13:41:19,587] DEEPMD INFO training data with upper boundary: [7.18840882 8.14703945] 2024-11-24 13:41:19.901264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:19.938585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:19.961196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:19.984552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 13:41:20.014652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:41:20,047] DEEPMD INFO built lr [2024-11-24 13:41:20,292] DEEPMD INFO use the compressible model with stripped type embedding [2024-11-24 13:41:20,657] DEEPMD INFO built network [2024-11-24 13:41:21,115] DEEPMD INFO built training [2024-11-24 13:41:21,115] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. 2024-11-24 13:41:21.117035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:41:21,141] DEEPMD INFO initialize model from scratch [2024-11-24 13:41:21,535] DEEPMD INFO finished compressing [2024-11-24 13:41:21,541] DEEPMD INFO [2024-11-24 13:41:21,541] DEEPMD INFO stage 2: freeze the model 2024-11-24 13:41:21.762211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22418 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 [2024-11-24 13:41:21,891] DEEPMD INFO The following nodes will be frozen: ['t_mesh', 'o_atom_energy', 'model_attr/tmap', 'o_energy', 'descrpt_attr/ntypes', 'descrpt_attr/rcut', 'o_atom_virial', 'model_attr/model_version', 'fitting_attr/daparam', 'train_attr/min_nbor_dist', 'o_force', 'model_type', 'o_virial', 'fitting_attr/dfparam', 'train_attr/training_script', 'model_attr/model_type'] [2024-11-24 13:41:21,999] DEEPMD INFO 633 ops in the final graph.
代码
文本
PyTorch
代码
文本
[17]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp --pt train input.json
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-24 13:41:32,604] DEEPMD INFO DeePMD version: 3.0.0 [2024-11-24 13:41:32,604] DEEPMD INFO Configuration path: input.json [bohrium-156-1225901:01051] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/2329346048/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:01051] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 [2024-11-24 13:41:33,817] DEEPMD INFO _____ _____ __ __ _____ _ _ _ [2024-11-24 13:41:33,817] DEEPMD INFO | __ \ | __ \ | \/ || __ \ | | (_)| | [2024-11-24 13:41:33,817] DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |_ [2024-11-24 13:41:33,817] DEEPMD INFO | | | | / _ \ / _ \| ___/ | |\/| || | | ||______|| |/ /| || __| [2024-11-24 13:41:33,817] DEEPMD INFO | |__| || __/| __/| | | | | || |__| | | < | || |_ [2024-11-24 13:41:33,817] DEEPMD INFO |_____/ \___| \___||_| |_| |_||_____/ |_|\_\|_| \__| [2024-11-24 13:41:33,817] DEEPMD INFO Please read and cite: [2024-11-24 13:41:33,817] DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018) [2024-11-24 13:41:33,817] DEEPMD INFO Zeng et al, J. Chem. Phys., 159, 054801 (2023) [2024-11-24 13:41:33,817] DEEPMD INFO See https://deepmd.rtfd.io/credits/ for details. [2024-11-24 13:41:33,817] DEEPMD INFO --------------------------------------------------------------------------------------------------------- [2024-11-24 13:41:33,817] DEEPMD INFO installed to: /root/deepmd-kit/lib/python3.12/site-packages/deepmd [2024-11-24 13:41:33,818] DEEPMD INFO source: [2024-11-24 13:41:33,818] DEEPMD INFO source branch: HEAD [2024-11-24 13:41:33,818] DEEPMD INFO source commit: b1be266 [2024-11-24 13:41:33,818] DEEPMD INFO source commit at: 2024-11-23 01:37:55 -0800 [2024-11-24 13:41:33,818] DEEPMD INFO use float prec: double [2024-11-24 13:41:33,818] DEEPMD INFO build variant: cuda [2024-11-24 13:41:33,818] DEEPMD INFO Backend: PyTorch [2024-11-24 13:41:33,818] DEEPMD INFO PT ver: v2.4.1.post302-gUnknown [2024-11-24 13:41:33,818] DEEPMD INFO Enable custom OP: True [2024-11-24 13:41:33,818] DEEPMD INFO build with PT ver: 2.4.1 [2024-11-24 13:41:33,818] DEEPMD INFO build with PT inc: /root/deepmd-kit/lib/python3.12/site-packages/torch/include [2024-11-24 13:41:33,818] DEEPMD INFO /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include [2024-11-24 13:41:33,818] DEEPMD INFO build with PT lib: /root/deepmd-kit/lib/python3.12/site-packages/torch/lib [2024-11-24 13:41:33,818] DEEPMD INFO running on: bohrium-156-1225901 [2024-11-24 13:41:33,818] DEEPMD INFO computing device: cuda:0 [2024-11-24 13:41:33,818] DEEPMD INFO CUDA_VISIBLE_DEVICES: unset [2024-11-24 13:41:33,818] DEEPMD INFO Count of visible GPUs: 1 [2024-11-24 13:41:33,818] DEEPMD INFO num_intra_threads: 0 [2024-11-24 13:41:33,818] DEEPMD INFO num_inter_threads: 0 [2024-11-24 13:41:33,818] DEEPMD INFO --------------------------------------------------------------------------------------------------------- [2024-11-24 13:41:33,975] DEEPMD INFO Calculate neighbor statistics... (add --skip-neighbor-stat to skip this step) [2024-11-24 13:41:37,538] DEEPMD INFO Adjust batch size from 1024 to 2048 [2024-11-24 13:41:37,657] DEEPMD INFO Adjust batch size from 2048 to 4096 [2024-11-24 13:41:37,747] DEEPMD INFO Adjust batch size from 4096 to 8192 [2024-11-24 13:41:38,111] DEEPMD INFO Adjust batch size from 8192 to 16384 [2024-11-24 13:41:38,179] DEEPMD INFO training data with min nbor dist: 0.8854385688525499 [2024-11-24 13:41:38,180] DEEPMD INFO training data with max nbor size: [108] [2024-11-24 13:41:38,238] DEEPMD INFO Packing data for statistics from 3 systems [2024-11-24 13:41:38,341] DEEPMD INFO RMSE of energy per atom after linear regression is: 0.003581501976900343 in the unit of energy. [2024-11-24 13:41:38,860] DEEPMD INFO ---Summary of DataSystem: training ----------------------------------------------- [2024-11-24 13:41:38,860] DEEPMD INFO found 3 system(s): [2024-11-24 13:41:38,860] DEEPMD INFO system natoms bch_sz n_bch prob pbc [2024-11-24 13:41:38,860] DEEPMD INFO ../data/data_0/ 192 1 80 2.500e-01 T [2024-11-24 13:41:38,860] DEEPMD INFO ../data/data_1/ 192 1 160 5.000e-01 T [2024-11-24 13:41:38,860] DEEPMD INFO ../data/data_2/ 192 1 80 2.500e-01 T [2024-11-24 13:41:38,861] DEEPMD INFO -------------------------------------------------------------------------------------- [2024-11-24 13:41:38,863] DEEPMD INFO ---Summary of DataSystem: validation ----------------------------------------------- [2024-11-24 13:41:38,863] DEEPMD INFO found 1 system(s): [2024-11-24 13:41:38,864] DEEPMD INFO system natoms bch_sz n_bch prob pbc [2024-11-24 13:41:38,864] DEEPMD INFO ../data/data_3 192 1 80 1.000e+00 T [2024-11-24 13:41:38,864] DEEPMD INFO -------------------------------------------------------------------------------------- [2024-11-24 13:41:38,867] DEEPMD INFO Start to train 1000 steps. [2024-11-24 13:41:40,496] DEEPMD INFO batch 1: trn: rmse = 2.54e+01, rmse_e = 1.80e+00, rmse_f = 7.94e-01, lr = 1.00e-03 [2024-11-24 13:41:40,496] DEEPMD INFO batch 1: val: rmse = 3.66e+01, rmse_e = 1.94e+00, rmse_f = 1.15e+00 [2024-11-24 13:41:40,496] DEEPMD INFO batch 1: total wall time = 1.63 s [2024-11-24 13:41:42,309] DEEPMD INFO batch 100: trn: rmse = 1.43e+01, rmse_e = 3.69e-03, rmse_f = 7.52e-01, lr = 3.62e-04 [2024-11-24 13:41:42,309] DEEPMD INFO batch 100: val: rmse = 1.46e+01, rmse_e = 3.72e-03, rmse_f = 7.65e-01 [2024-11-24 13:41:42,309] DEEPMD INFO batch 100: total wall time = 1.81 s [2024-11-24 13:41:44,146] DEEPMD INFO batch 200: trn: rmse = 7.71e+00, rmse_e = 1.72e-02, rmse_f = 6.70e-01, lr = 1.31e-04 [2024-11-24 13:41:44,146] DEEPMD INFO batch 200: val: rmse = 8.22e+00, rmse_e = 9.88e-03, rmse_f = 7.15e-01 [2024-11-24 13:41:44,146] DEEPMD INFO batch 200: total wall time = 1.84 s [2024-11-24 13:41:45,985] DEEPMD INFO batch 300: trn: rmse = 4.86e+00, rmse_e = 1.64e-03, rmse_f = 6.98e-01, lr = 4.75e-05 [2024-11-24 13:41:45,986] DEEPMD INFO batch 300: val: rmse = 4.81e+00, rmse_e = 3.24e-03, rmse_f = 6.91e-01 [2024-11-24 13:41:45,986] DEEPMD INFO batch 300: total wall time = 1.84 s [2024-11-24 13:41:47,826] DEEPMD INFO batch 400: trn: rmse = 3.04e+00, rmse_e = 1.54e-05, rmse_f = 7.13e-01, lr = 1.72e-05 [2024-11-24 13:41:47,826] DEEPMD INFO batch 400: val: rmse = 3.07e+00, rmse_e = 1.01e-03, rmse_f = 7.20e-01 [2024-11-24 13:41:47,827] DEEPMD INFO batch 400: total wall time = 1.84 s [2024-11-24 13:41:49,651] DEEPMD INFO batch 500: trn: rmse = 1.83e+00, rmse_e = 4.46e-03, rmse_f = 6.80e-01, lr = 6.24e-06 [2024-11-24 13:41:49,652] DEEPMD INFO batch 500: val: rmse = 1.85e+00, rmse_e = 1.77e-03, rmse_f = 6.88e-01 [2024-11-24 13:41:49,652] DEEPMD INFO batch 500: total wall time = 1.83 s [2024-11-24 13:41:51,479] DEEPMD INFO batch 600: trn: rmse = 1.27e+00, rmse_e = 9.51e-04, rmse_f = 7.03e-01, lr = 2.26e-06 [2024-11-24 13:41:51,479] DEEPMD INFO batch 600: val: rmse = 1.20e+00, rmse_e = 2.42e-03, rmse_f = 6.64e-01 [2024-11-24 13:41:51,479] DEEPMD INFO batch 600: total wall time = 1.83 s [2024-11-24 13:41:53,297] DEEPMD INFO batch 700: trn: rmse = 8.49e-01, rmse_e = 3.27e-04, rmse_f = 6.30e-01, lr = 8.18e-07 [2024-11-24 13:41:53,297] DEEPMD INFO batch 700: val: rmse = 9.10e-01, rmse_e = 3.94e-03, rmse_f = 6.74e-01 [2024-11-24 13:41:53,297] DEEPMD INFO batch 700: total wall time = 1.82 s [2024-11-24 13:41:55,116] DEEPMD INFO batch 800: trn: rmse = 7.90e-01, rmse_e = 1.85e-04, rmse_f = 6.94e-01, lr = 2.96e-07 [2024-11-24 13:41:55,116] DEEPMD INFO batch 800: val: rmse = 7.95e-01, rmse_e = 7.31e-03, rmse_f = 6.91e-01 [2024-11-24 13:41:55,116] DEEPMD INFO batch 800: total wall time = 1.82 s [2024-11-24 13:41:56,935] DEEPMD INFO batch 900: trn: rmse = 7.36e-01, rmse_e = 3.92e-03, rmse_f = 6.98e-01, lr = 1.07e-07 [2024-11-24 13:41:56,935] DEEPMD INFO batch 900: val: rmse = 7.23e-01, rmse_e = 2.54e-03, rmse_f = 6.86e-01 [2024-11-24 13:41:56,935] DEEPMD INFO batch 900: total wall time = 1.82 s [2024-11-24 13:41:58,762] DEEPMD INFO batch 1000: trn: rmse = 7.16e-01, rmse_e = 1.34e-03, rmse_f = 7.02e-01, lr = 3.89e-08 [2024-11-24 13:41:58,762] DEEPMD INFO batch 1000: val: rmse = 7.11e-01, rmse_e = 7.25e-03, rmse_f = 6.90e-01 [2024-11-24 13:41:58,762] DEEPMD INFO batch 1000: total wall time = 1.83 s [2024-11-24 13:41:58,799] DEEPMD INFO Saved model to model.ckpt-1000.pt [2024-11-24 13:41:58,800] DEEPMD INFO average training time: 0.0165 s/batch [2024-11-24 13:41:58,800] DEEPMD INFO Trained model has been saved to: model.ckpt
代码
文本
[18]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp --pt freeze
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-24 13:42:29,848] DEEPMD INFO DeePMD version: 3.0.0 [2024-11-24 13:42:32,084] DEEPMD INFO Saved frozen model to frozen_model.pth
代码
文本
[19]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp --pt compress
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-24 13:42:36,619] DEEPMD INFO DeePMD version: 3.0.0 [2024-11-24 13:42:38,442] DEEPMD INFO training data with lower boundary: [[-0.38978084 -0. -0. -0. ] [-0.42122849 -0. -0. -0. ]] [2024-11-24 13:42:38,442] DEEPMD INFO training data with upper boundary: [[ 7.19234743 12.23617724 12.23617724 12.23617724] [ 8.15177409 13.68445638 13.68445638 13.68445638]]
代码
文本
我们现在得到了4个模型文件:frozen_model.pb
、frozen_model_compressed.pb
、frozen_model.pth
、frozen_model_compressed.pth
。
代码
文本
模型转换
代码
文本
JAX后端目前不支持训练,因此我们用dp convert-backend 将PyTorch后端模型文件转换为JAX后端模型文件:
代码
文本
[20]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp convert-backend frozen_model.pth frozen_model.savedmodel
[2024-11-24 13:43:31,664] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. 2024-11-24 13:43:35.224619: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 13:43:35.244121: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 13:43:35.250210: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
代码
文本
模型测试
代码
文本
dp test
将根据模型的后缀名判断后端,无需使用--tf
或--pt
。
再次提醒:模型只训练了1000步,因此RMSE大是正常现象。
代码
文本
[22]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp test -m frozen_model_compressed.pb -s ../data
2024-11-24 13:45:14.152459: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 13:45:14.169647: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 13:45:14.174884: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered [2024-11-24 13:45:16,573] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-24 13:45:18,327] DEEPMD INFO If you encounter the error 'an illegal memory access was encountered', this may be due to a TensorFlow issue. To avoid this, set the environment variable DP_INFER_BATCH_SIZE to a smaller value than the last adjusted batch size. The environment variable DP_INFER_BATCH_SIZE controls the inference batch size (nframes * natoms). [2024-11-24 13:45:18,357] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:45:18,357] DEEPMD INFO # testing system : ../data/data_1 [2024-11-24 13:45:21,221] DEEPMD INFO Adjust batch size from 1024 to 2048 [2024-11-24 13:45:21,244] DEEPMD INFO Adjust batch size from 2048 to 4096 [2024-11-24 13:45:21,287] DEEPMD INFO Adjust batch size from 4096 to 8192 [2024-11-24 13:45:21,369] DEEPMD INFO Adjust batch size from 8192 to 16384 [2024-11-24 13:45:21,546] DEEPMD INFO # number of test data : 160 [2024-11-24 13:45:21,546] DEEPMD INFO Energy MAE : 5.179024e-01 eV [2024-11-24 13:45:21,546] DEEPMD INFO Energy RMSE : 6.224318e-01 eV [2024-11-24 13:45:21,546] DEEPMD INFO Energy MAE/Natoms : 2.697408e-03 eV [2024-11-24 13:45:21,546] DEEPMD INFO Energy RMSE/Natoms : 3.241832e-03 eV [2024-11-24 13:45:21,546] DEEPMD INFO Force MAE : 5.844789e-01 eV/A [2024-11-24 13:45:21,546] DEEPMD INFO Force RMSE : 7.829774e-01 eV/A [2024-11-24 13:45:21,547] DEEPMD INFO Virial MAE : 1.972750e+01 eV [2024-11-24 13:45:21,547] DEEPMD INFO Virial RMSE : 3.339188e+01 eV [2024-11-24 13:45:21,547] DEEPMD INFO Virial MAE/Natoms : 1.027474e-01 eV [2024-11-24 13:45:21,547] DEEPMD INFO Virial RMSE/Natoms : 1.739160e-01 eV [2024-11-24 13:45:21,547] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:45:21,547] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:45:21,547] DEEPMD INFO # testing system : ../data/data_2 [2024-11-24 13:45:21,627] DEEPMD INFO # number of test data : 80 [2024-11-24 13:45:21,628] DEEPMD INFO Energy MAE : 4.779663e-01 eV [2024-11-24 13:45:21,628] DEEPMD INFO Energy RMSE : 6.033248e-01 eV [2024-11-24 13:45:21,628] DEEPMD INFO Energy MAE/Natoms : 2.489408e-03 eV [2024-11-24 13:45:21,628] DEEPMD INFO Energy RMSE/Natoms : 3.142316e-03 eV [2024-11-24 13:45:21,628] DEEPMD INFO Force MAE : 5.813479e-01 eV/A [2024-11-24 13:45:21,628] DEEPMD INFO Force RMSE : 7.761452e-01 eV/A [2024-11-24 13:45:21,628] DEEPMD INFO Virial MAE : 1.961763e+01 eV [2024-11-24 13:45:21,628] DEEPMD INFO Virial RMSE : 3.325970e+01 eV [2024-11-24 13:45:21,628] DEEPMD INFO Virial MAE/Natoms : 1.021752e-01 eV [2024-11-24 13:45:21,628] DEEPMD INFO Virial RMSE/Natoms : 1.732276e-01 eV [2024-11-24 13:45:21,628] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:45:21,628] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:45:21,628] DEEPMD INFO # testing system : ../data/data_0 [2024-11-24 13:45:21,710] DEEPMD INFO # number of test data : 80 [2024-11-24 13:45:21,710] DEEPMD INFO Energy MAE : 5.671510e-01 eV [2024-11-24 13:45:21,710] DEEPMD INFO Energy RMSE : 7.216989e-01 eV [2024-11-24 13:45:21,710] DEEPMD INFO Energy MAE/Natoms : 2.953911e-03 eV [2024-11-24 13:45:21,710] DEEPMD INFO Energy RMSE/Natoms : 3.758848e-03 eV [2024-11-24 13:45:21,710] DEEPMD INFO Force MAE : 5.845286e-01 eV/A [2024-11-24 13:45:21,710] DEEPMD INFO Force RMSE : 7.833679e-01 eV/A [2024-11-24 13:45:21,710] DEEPMD INFO Virial MAE : 1.964959e+01 eV [2024-11-24 13:45:21,710] DEEPMD INFO Virial RMSE : 3.334908e+01 eV [2024-11-24 13:45:21,710] DEEPMD INFO Virial MAE/Natoms : 1.023416e-01 eV [2024-11-24 13:45:21,710] DEEPMD INFO Virial RMSE/Natoms : 1.736931e-01 eV [2024-11-24 13:45:21,710] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:45:21,710] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:45:21,710] DEEPMD INFO # testing system : ../data/data_3 [2024-11-24 13:45:21,789] DEEPMD INFO # number of test data : 80 [2024-11-24 13:45:21,789] DEEPMD INFO Energy MAE : 5.817025e-01 eV [2024-11-24 13:45:21,789] DEEPMD INFO Energy RMSE : 7.101678e-01 eV [2024-11-24 13:45:21,789] DEEPMD INFO Energy MAE/Natoms : 3.029701e-03 eV [2024-11-24 13:45:21,789] DEEPMD INFO Energy RMSE/Natoms : 3.698790e-03 eV [2024-11-24 13:45:21,789] DEEPMD INFO Force MAE : 5.875087e-01 eV/A [2024-11-24 13:45:21,789] DEEPMD INFO Force RMSE : 7.852253e-01 eV/A [2024-11-24 13:45:21,789] DEEPMD INFO Virial MAE : 1.962188e+01 eV [2024-11-24 13:45:21,790] DEEPMD INFO Virial RMSE : 3.324730e+01 eV [2024-11-24 13:45:21,790] DEEPMD INFO Virial MAE/Natoms : 1.021973e-01 eV [2024-11-24 13:45:21,790] DEEPMD INFO Virial RMSE/Natoms : 1.731630e-01 eV [2024-11-24 13:45:21,790] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:45:21,790] DEEPMD INFO # ----------weighted average of errors----------- [2024-11-24 13:45:21,790] DEEPMD INFO # number of systems : 4 [2024-11-24 13:45:21,790] DEEPMD INFO Energy MAE : 5.325249e-01 eV [2024-11-24 13:45:21,790] DEEPMD INFO Energy RMSE : 6.578801e-01 eV [2024-11-24 13:45:21,790] DEEPMD INFO Energy MAE/Natoms : 2.773567e-03 eV [2024-11-24 13:45:21,790] DEEPMD INFO Energy RMSE/Natoms : 3.426459e-03 eV [2024-11-24 13:45:21,790] DEEPMD INFO Force MAE : 5.844686e-01 eV/A [2024-11-24 13:45:21,790] DEEPMD INFO Force RMSE : 7.821448e-01 eV/A [2024-11-24 13:45:21,790] DEEPMD INFO Virial MAE : 1.966882e+01 eV [2024-11-24 13:45:21,790] DEEPMD INFO Virial RMSE : 3.332803e+01 eV [2024-11-24 13:45:21,790] DEEPMD INFO Virial MAE/Natoms : 1.024418e-01 eV [2024-11-24 13:45:21,790] DEEPMD INFO Virial RMSE/Natoms : 1.735835e-01 eV [2024-11-24 13:45:21,790] DEEPMD INFO # -----------------------------------------------
代码
文本
[24]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp test -m frozen_model_compressed.pth -s ../data
[2024-11-24 13:46:02,650] DEEPMD WARNING To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-24 13:46:05,378] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:46:05,378] DEEPMD INFO # testing system : ../data/data_1 [2024-11-24 13:46:07,995] DEEPMD INFO Adjust batch size from 1024 to 2048 [2024-11-24 13:46:08,686] DEEPMD INFO Adjust batch size from 2048 to 4096 [2024-11-24 13:46:09,696] DEEPMD INFO Adjust batch size from 4096 to 8192 [2024-11-24 13:46:10,442] DEEPMD INFO Adjust batch size from 8192 to 16384 [2024-11-24 13:46:11,741] DEEPMD INFO # number of test data : 160 [2024-11-24 13:46:11,741] DEEPMD INFO Energy MAE : 6.760290e-01 eV [2024-11-24 13:46:11,741] DEEPMD INFO Energy RMSE : 8.614848e-01 eV [2024-11-24 13:46:11,741] DEEPMD INFO Energy MAE/Natoms : 3.520984e-03 eV [2024-11-24 13:46:11,741] DEEPMD INFO Energy RMSE/Natoms : 4.486900e-03 eV [2024-11-24 13:46:11,741] DEEPMD INFO Force MAE : 5.224798e-01 eV/A [2024-11-24 13:46:11,741] DEEPMD INFO Force RMSE : 6.874469e-01 eV/A [2024-11-24 13:46:11,741] DEEPMD INFO Virial MAE : 1.055268e+02 eV [2024-11-24 13:46:11,741] DEEPMD INFO Virial RMSE : 1.807128e+02 eV [2024-11-24 13:46:11,741] DEEPMD INFO Virial MAE/Natoms : 5.496190e-01 eV [2024-11-24 13:46:11,741] DEEPMD INFO Virial RMSE/Natoms : 9.412124e-01 eV [2024-11-24 13:46:11,741] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:46:11,741] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:46:11,741] DEEPMD INFO # testing system : ../data/data_2 [2024-11-24 13:46:11,867] DEEPMD INFO # number of test data : 80 [2024-11-24 13:46:11,868] DEEPMD INFO Energy MAE : 5.619077e-01 eV [2024-11-24 13:46:11,868] DEEPMD INFO Energy RMSE : 7.171949e-01 eV [2024-11-24 13:46:11,868] DEEPMD INFO Energy MAE/Natoms : 2.926602e-03 eV [2024-11-24 13:46:11,868] DEEPMD INFO Energy RMSE/Natoms : 3.735390e-03 eV [2024-11-24 13:46:11,868] DEEPMD INFO Force MAE : 5.206556e-01 eV/A [2024-11-24 13:46:11,868] DEEPMD INFO Force RMSE : 6.828324e-01 eV/A [2024-11-24 13:46:11,868] DEEPMD INFO Virial MAE : 1.053576e+02 eV [2024-11-24 13:46:11,868] DEEPMD INFO Virial RMSE : 1.804098e+02 eV [2024-11-24 13:46:11,868] DEEPMD INFO Virial MAE/Natoms : 5.487373e-01 eV [2024-11-24 13:46:11,868] DEEPMD INFO Virial RMSE/Natoms : 9.396344e-01 eV [2024-11-24 13:46:11,868] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:46:11,868] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:46:11,868] DEEPMD INFO # testing system : ../data/data_0 [2024-11-24 13:46:11,991] DEEPMD INFO # number of test data : 80 [2024-11-24 13:46:11,992] DEEPMD INFO Energy MAE : 7.195613e-01 eV [2024-11-24 13:46:11,992] DEEPMD INFO Energy RMSE : 8.736386e-01 eV [2024-11-24 13:46:11,992] DEEPMD INFO Energy MAE/Natoms : 3.747715e-03 eV [2024-11-24 13:46:11,992] DEEPMD INFO Energy RMSE/Natoms : 4.550201e-03 eV [2024-11-24 13:46:11,992] DEEPMD INFO Force MAE : 5.222212e-01 eV/A [2024-11-24 13:46:11,992] DEEPMD INFO Force RMSE : 6.863119e-01 eV/A [2024-11-24 13:46:11,992] DEEPMD INFO Virial MAE : 1.055419e+02 eV [2024-11-24 13:46:11,992] DEEPMD INFO Virial RMSE : 1.807917e+02 eV [2024-11-24 13:46:11,992] DEEPMD INFO Virial MAE/Natoms : 5.496976e-01 eV [2024-11-24 13:46:11,992] DEEPMD INFO Virial RMSE/Natoms : 9.416233e-01 eV [2024-11-24 13:46:11,992] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:46:11,992] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:46:11,992] DEEPMD INFO # testing system : ../data/data_3 [2024-11-24 13:46:12,114] DEEPMD INFO # number of test data : 80 [2024-11-24 13:46:12,114] DEEPMD INFO Energy MAE : 6.737088e-01 eV [2024-11-24 13:46:12,114] DEEPMD INFO Energy RMSE : 8.147767e-01 eV [2024-11-24 13:46:12,114] DEEPMD INFO Energy MAE/Natoms : 3.508900e-03 eV [2024-11-24 13:46:12,114] DEEPMD INFO Energy RMSE/Natoms : 4.243629e-03 eV [2024-11-24 13:46:12,115] DEEPMD INFO Force MAE : 5.249346e-01 eV/A [2024-11-24 13:46:12,115] DEEPMD INFO Force RMSE : 6.895287e-01 eV/A [2024-11-24 13:46:12,115] DEEPMD INFO Virial MAE : 1.051727e+02 eV [2024-11-24 13:46:12,115] DEEPMD INFO Virial RMSE : 1.802760e+02 eV [2024-11-24 13:46:12,115] DEEPMD INFO Virial MAE/Natoms : 5.477746e-01 eV [2024-11-24 13:46:12,115] DEEPMD INFO Virial RMSE/Natoms : 9.389376e-01 eV [2024-11-24 13:46:12,115] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:46:12,115] DEEPMD INFO # ----------weighted average of errors----------- [2024-11-24 13:46:12,115] DEEPMD INFO # number of systems : 4 [2024-11-24 13:46:12,115] DEEPMD INFO Energy MAE : 6.614471e-01 eV [2024-11-24 13:46:12,115] DEEPMD INFO Energy RMSE : 8.277423e-01 eV [2024-11-24 13:46:12,115] DEEPMD INFO Energy MAE/Natoms : 3.445037e-03 eV [2024-11-24 13:46:12,115] DEEPMD INFO Energy RMSE/Natoms : 4.311158e-03 eV [2024-11-24 13:46:12,115] DEEPMD INFO Force MAE : 5.225542e-01 eV/A [2024-11-24 13:46:12,115] DEEPMD INFO Force RMSE : 6.867169e-01 eV/A [2024-11-24 13:46:12,115] DEEPMD INFO Virial MAE : 1.054252e+02 eV [2024-11-24 13:46:12,115] DEEPMD INFO Virial RMSE : 1.805807e+02 eV [2024-11-24 13:46:12,115] DEEPMD INFO Virial MAE/Natoms : 5.490895e-01 eV [2024-11-24 13:46:12,115] DEEPMD INFO Virial RMSE/Natoms : 9.405246e-01 eV [2024-11-24 13:46:12,115] DEEPMD INFO # -----------------------------------------------
代码
文本
[26]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
dp test -m frozen_model.savedmodel -s ../data
2024-11-24 13:46:45.393625: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 13:46:45.411187: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 13:46:45.416461: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered WARNING:absl:Importing a function (__inference_internal_grad_fn_2447) with ops with unsaved custom gradients. Will likely fail if a gradient is requested. WARNING:absl:Importing a function (__inference_internal_grad_fn_2372) with ops with unsaved custom gradients. Will likely fail if a gradient is requested. WARNING:absl:Importing a function (__inference_internal_grad_fn_2522) with ops with unsaved custom gradients. Will likely fail if a gradient is requested. WARNING:absl:Importing a function (__inference_internal_grad_fn_2597) with ops with unsaved custom gradients. Will likely fail if a gradient is requested. WARNING:absl:Importing a function (__inference_internal_grad_fn_2447) with ops with unsaved custom gradients. Will likely fail if a gradient is requested. WARNING:absl:Importing a function (__inference_internal_grad_fn_2372) with ops with unsaved custom gradients. Will likely fail if a gradient is requested. WARNING:absl:Importing a function (__inference_internal_grad_fn_2522) with ops with unsaved custom gradients. Will likely fail if a gradient is requested. WARNING:absl:Importing a function (__inference_internal_grad_fn_2597) with ops with unsaved custom gradients. Will likely fail if a gradient is requested. [2024-11-24 13:46:51,175] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:46:51,175] DEEPMD INFO # testing system : ../data/data_1 [2024-11-24 13:46:59,689] DEEPMD INFO Adjust batch size from 1024 to 2048 [2024-11-24 13:47:03,304] DEEPMD INFO Adjust batch size from 2048 to 4096 [2024-11-24 13:47:07,243] DEEPMD INFO Adjust batch size from 4096 to 8192 [2024-11-24 13:47:11,693] DEEPMD INFO Adjust batch size from 8192 to 16384 [2024-11-24 13:47:16,897] DEEPMD INFO # number of test data : 160 [2024-11-24 13:47:16,898] DEEPMD INFO Energy MAE : 6.760290e-01 eV [2024-11-24 13:47:16,898] DEEPMD INFO Energy RMSE : 8.614848e-01 eV [2024-11-24 13:47:16,898] DEEPMD INFO Energy MAE/Natoms : 3.520984e-03 eV [2024-11-24 13:47:16,898] DEEPMD INFO Energy RMSE/Natoms : 4.486900e-03 eV [2024-11-24 13:47:16,898] DEEPMD INFO Force MAE : 5.224798e-01 eV/A [2024-11-24 13:47:16,898] DEEPMD INFO Force RMSE : 6.874469e-01 eV/A [2024-11-24 13:47:16,898] DEEPMD INFO Virial MAE : 1.055268e+02 eV [2024-11-24 13:47:16,898] DEEPMD INFO Virial RMSE : 1.807128e+02 eV [2024-11-24 13:47:16,898] DEEPMD INFO Virial MAE/Natoms : 5.496190e-01 eV [2024-11-24 13:47:16,898] DEEPMD INFO Virial RMSE/Natoms : 9.412124e-01 eV [2024-11-24 13:47:16,898] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:47:16,898] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:47:16,898] DEEPMD INFO # testing system : ../data/data_2 [2024-11-24 13:47:22,023] DEEPMD INFO # number of test data : 80 [2024-11-24 13:47:22,023] DEEPMD INFO Energy MAE : 5.619077e-01 eV [2024-11-24 13:47:22,023] DEEPMD INFO Energy RMSE : 7.171949e-01 eV [2024-11-24 13:47:22,023] DEEPMD INFO Energy MAE/Natoms : 2.926602e-03 eV [2024-11-24 13:47:22,023] DEEPMD INFO Energy RMSE/Natoms : 3.735390e-03 eV [2024-11-24 13:47:22,023] DEEPMD INFO Force MAE : 5.206556e-01 eV/A [2024-11-24 13:47:22,023] DEEPMD INFO Force RMSE : 6.828324e-01 eV/A [2024-11-24 13:47:22,023] DEEPMD INFO Virial MAE : 1.053576e+02 eV [2024-11-24 13:47:22,023] DEEPMD INFO Virial RMSE : 1.804098e+02 eV [2024-11-24 13:47:22,023] DEEPMD INFO Virial MAE/Natoms : 5.487373e-01 eV [2024-11-24 13:47:22,023] DEEPMD INFO Virial RMSE/Natoms : 9.396344e-01 eV [2024-11-24 13:47:22,023] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:47:22,023] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:47:22,023] DEEPMD INFO # testing system : ../data/data_0 [2024-11-24 13:47:22,497] DEEPMD INFO # number of test data : 80 [2024-11-24 13:47:22,497] DEEPMD INFO Energy MAE : 7.195613e-01 eV [2024-11-24 13:47:22,497] DEEPMD INFO Energy RMSE : 8.736386e-01 eV [2024-11-24 13:47:22,497] DEEPMD INFO Energy MAE/Natoms : 3.747715e-03 eV [2024-11-24 13:47:22,497] DEEPMD INFO Energy RMSE/Natoms : 4.550201e-03 eV [2024-11-24 13:47:22,497] DEEPMD INFO Force MAE : 5.222212e-01 eV/A [2024-11-24 13:47:22,497] DEEPMD INFO Force RMSE : 6.863119e-01 eV/A [2024-11-24 13:47:22,497] DEEPMD INFO Virial MAE : 1.055419e+02 eV [2024-11-24 13:47:22,497] DEEPMD INFO Virial RMSE : 1.807917e+02 eV [2024-11-24 13:47:22,497] DEEPMD INFO Virial MAE/Natoms : 5.496976e-01 eV [2024-11-24 13:47:22,497] DEEPMD INFO Virial RMSE/Natoms : 9.416233e-01 eV [2024-11-24 13:47:22,497] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:47:22,497] DEEPMD INFO # ---------------output of dp test--------------- [2024-11-24 13:47:22,497] DEEPMD INFO # testing system : ../data/data_3 [2024-11-24 13:47:22,969] DEEPMD INFO # number of test data : 80 [2024-11-24 13:47:22,969] DEEPMD INFO Energy MAE : 6.737088e-01 eV [2024-11-24 13:47:22,969] DEEPMD INFO Energy RMSE : 8.147767e-01 eV [2024-11-24 13:47:22,969] DEEPMD INFO Energy MAE/Natoms : 3.508900e-03 eV [2024-11-24 13:47:22,969] DEEPMD INFO Energy RMSE/Natoms : 4.243629e-03 eV [2024-11-24 13:47:22,969] DEEPMD INFO Force MAE : 5.249346e-01 eV/A [2024-11-24 13:47:22,969] DEEPMD INFO Force RMSE : 6.895287e-01 eV/A [2024-11-24 13:47:22,969] DEEPMD INFO Virial MAE : 1.051727e+02 eV [2024-11-24 13:47:22,970] DEEPMD INFO Virial RMSE : 1.802760e+02 eV [2024-11-24 13:47:22,970] DEEPMD INFO Virial MAE/Natoms : 5.477746e-01 eV [2024-11-24 13:47:22,970] DEEPMD INFO Virial RMSE/Natoms : 9.389376e-01 eV [2024-11-24 13:47:22,970] DEEPMD INFO # ----------------------------------------------- [2024-11-24 13:47:22,970] DEEPMD INFO # ----------weighted average of errors----------- [2024-11-24 13:47:22,970] DEEPMD INFO # number of systems : 4 [2024-11-24 13:47:22,970] DEEPMD INFO Energy MAE : 6.614471e-01 eV [2024-11-24 13:47:22,970] DEEPMD INFO Energy RMSE : 8.277423e-01 eV [2024-11-24 13:47:22,970] DEEPMD INFO Energy MAE/Natoms : 3.445037e-03 eV [2024-11-24 13:47:22,970] DEEPMD INFO Energy RMSE/Natoms : 4.311158e-03 eV [2024-11-24 13:47:22,970] DEEPMD INFO Force MAE : 5.225542e-01 eV/A [2024-11-24 13:47:22,970] DEEPMD INFO Force RMSE : 6.867169e-01 eV/A [2024-11-24 13:47:22,970] DEEPMD INFO Virial MAE : 1.054252e+02 eV [2024-11-24 13:47:22,970] DEEPMD INFO Virial RMSE : 1.805807e+02 eV [2024-11-24 13:47:22,970] DEEPMD INFO Virial MAE/Natoms : 5.490895e-01 eV [2024-11-24 13:47:22,970] DEEPMD INFO Virial RMSE/Natoms : 9.405246e-01 eV [2024-11-24 13:47:22,970] DEEPMD INFO # -----------------------------------------------
代码
文本
LAMMPS动力学模拟测速
代码
文本
虽然快速得到的模型几乎不可用于生产,但我们可以对不同后端的模型进行测速。
代码
文本
[27]
%cd ../lmp
/deepmd-kit/examples/water/lmp
代码
文本
这里,我们没有执行进行NVE或NVT等路径积分,因此模拟每一步的坐标将是一样的。对于每一个模型,我们先执行100步,以进行冷启动,再执行500步进行实际的测速。
代码
文本
[48]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
cat<<EOF > tf.in
units metal
boundary p p p
atom_style atomic
neighbor 0.0 bin
neigh_modify every 50 delay 0 check no
read_data water.lmp
mass 1 16
mass 2 2
replicate 4 4 4
pair_style deepmd ../se_atten_compressible/frozen_model.pb
pair_coeff * *
velocity all create 330.0 23456789
timestep 0.0005
thermo_style custom step pe ke etotal temp press vol
thermo 20
run 100
run 500
EOF
lmp -in tf.in
[bohrium-156-1225901:03558] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/3820224512/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:03558] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 LAMMPS (29 Aug 2024) OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98) using 1 OpenMP thread(s) per MPI task DeePMD-kit: Successfully load libcudart.so.12 2024-11-24 14:24:14.762167: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 14:24:14.781782: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 14:24:14.787952: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp Reading data file ... triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid reading atoms ... 192 atoms read_data CPU = 0.001 seconds Replication is creating a 4x4x4 = 64 times larger system... triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid 12288 atoms replicate CPU = 0.001 seconds Summary of lammps deepmd module ... >>> Info of deepmd-kit: installed to: /root/deepmd-kit source: source branch: HEAD source commit: b1be266 source commit at: 2024-11-23 01:37:55 -0800 support model ver.: 1.1 build variant: cuda build with tf inc: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include build with tf lib: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2 build with pt lib: torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so set tf intra_op_parallelism_threads: 0 set tf inter_op_parallelism_threads: 0 >>> Info of lammps module: DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. 2024-11-24 14:24:18.624530: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1732429458.625574 3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429458.627669 3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429458.627870 3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429458.628048 3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429458.628174 3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429458.628311 3558 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-11-24 14:24:18.628434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29250 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 14:24:18.663748: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled use deepmd-kit at: /root/deepmd-kit >>> Info of model(s): using 1 model(s): ../se_atten_compressible/frozen_model.pb rcut in model: 6 ntypes in model: 2 CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419 - USER-DEEPMD package: The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Neighbor list info ... update: every = 50 steps, delay = 0 steps, check = no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6 ghost atom cutoff = 6 binsize = 3, bins = 17 17 17 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair deepmd, perpetual attributes: full, newton on pair build: full/bin/atomonly stencil: full/bin/3d bin: standard Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 0 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 20 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 40 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 60 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 80 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 100 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 Loop time of 14.5451 on 1 procs for 100 steps with 12288 atoms Performance: 0.297 ns/day, 80.806 hours/ns, 6.875 timesteps/s, 84.482 katom-step/s 46.5% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 14.464 | 14.464 | 14.464 | 0.0 | 99.45 Neigh | 0.066192 | 0.066192 | 0.066192 | 0.0 | 0.46 Comm | 0.011048 | 0.011048 | 0.011048 | 0.0 | 0.08 Output | 0.00050548 | 0.00050548 | 0.00050548 | 0.0 | 0.00 Modify | 7.5344e-05 | 7.5344e-05 | 7.5344e-05 | 0.0 | 0.00 Other | | 0.00289 | | | 0.02 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 2 Dangerous builds not checked WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : metal Current step : 100 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 100 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 120 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 140 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 160 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 180 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 200 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 220 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 240 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 260 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 280 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 300 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 320 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 340 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 360 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 380 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 400 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 420 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 440 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 460 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 480 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 500 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 520 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 540 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 560 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 580 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 600 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 Loop time of 72.6601 on 1 procs for 500 steps with 12288 atoms Performance: 0.297 ns/day, 80.733 hours/ns, 6.881 timesteps/s, 84.558 katom-step/s 46.4% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 72.258 | 72.258 | 72.258 | 0.0 | 99.45 Neigh | 0.33104 | 0.33104 | 0.33104 | 0.0 | 0.46 Comm | 0.054271 | 0.054271 | 0.054271 | 0.0 | 0.07 Output | 0.0024261 | 0.0024261 | 0.0024261 | 0.0 | 0.00 Modify | 0.00036951 | 0.00036951 | 0.00036951 | 0.0 | 0.00 Other | | 0.01433 | | | 0.02 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 10 Dangerous builds not checked Total wall time: 0:01:35
代码
文本
[49]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
cat<<EOF > tfc.in
units metal
boundary p p p
atom_style atomic
neighbor 0.0 bin
neigh_modify every 50 delay 0 check no
read_data water.lmp
mass 1 16
mass 2 2
replicate 4 4 4
pair_style deepmd ../se_atten_compressible/frozen_model_compressed.pb
pair_coeff * *
velocity all create 330.0 23456789
timestep 0.0005
thermo_style custom step pe ke etotal temp press vol
thermo 20
run 100
run 500
EOF
lmp -in tfc.in
[bohrium-156-1225901:03665] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/4094951424/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:03665] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 LAMMPS (29 Aug 2024) OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98) using 1 OpenMP thread(s) per MPI task DeePMD-kit: Successfully load libcudart.so.12 2024-11-24 14:25:52.964270: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 14:25:52.983780: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 14:25:52.989813: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp Reading data file ... triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid reading atoms ... 192 atoms read_data CPU = 0.001 seconds Replication is creating a 4x4x4 = 64 times larger system... triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid 12288 atoms replicate CPU = 0.001 seconds Summary of lammps deepmd module ... >>> Info of deepmd-kit: installed to: /root/deepmd-kit source: source branch: HEAD source commit: b1be266 source commit at: 2024-11-23 01:37:55 -0800 support model ver.: 1.1 build variant: cuda build with tf inc: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include build with tf lib: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2 build with pt lib: torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so set tf intra_op_parallelism_threads: 0 set tf inter_op_parallelism_threads: 0 >>> Info of lammps module: DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419 - USER-DEEPMD package: The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Neighbor list info ... update: every = 50 steps, delay = 0 steps, check = no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6 ghost atom cutoff = 6 binsize = 3, bins = 17 17 17 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair deepmd, perpetual attributes: full, newton on pair build: full/bin/atomonly stencil: full/bin/3d bin: standard Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 0 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 20 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 40 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 60 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 80 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 100 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 Loop time of 3.94721 on 1 procs for 100 steps with 12288 atoms Performance: 1.094 ns/day, 21.929 hours/ns, 25.334 timesteps/s, 311.309 katom-step/s 67.0% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 3.8685 | 3.8685 | 3.8685 | 0.0 | 98.01 Neigh | 0.066096 | 0.066096 | 0.066096 | 0.0 | 1.67 Comm | 0.0098926 | 0.0098926 | 0.0098926 | 0.0 | 0.25 Output | 0.00047323 | 0.00047323 | 0.00047323 | 0.0 | 0.01 Modify | 5.4836e-05 | 5.4836e-05 | 5.4836e-05 | 0.0 | 0.00 Other | | 0.00223 | | | 0.06 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 2 Dangerous builds not checked WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : metal Current step : 100 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 100 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 120 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 140 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 160 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 180 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 200 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 220 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 240 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 260 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 280 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 300 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 320 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 340 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 360 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 380 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 400 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 420 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 440 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 460 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 480 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 500 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 520 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 540 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 560 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 580 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 600 -1916365.1 524.1124 -1915841 330 -42650.41 123348.33 Loop time of 19.6433 on 1 procs for 500 steps with 12288 atoms Performance: 1.100 ns/day, 21.826 hours/ns, 25.454 timesteps/s, 312.779 katom-step/s 67.2% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 19.251 | 19.251 | 19.251 | 0.0 | 98.01 Neigh | 0.33065 | 0.33065 | 0.33065 | 0.0 | 1.68 Comm | 0.047609 | 0.047609 | 0.047609 | 0.0 | 0.24 Output | 0.0023861 | 0.0023861 | 0.0023861 | 0.0 | 0.01 Modify | 0.00031468 | 0.00031468 | 0.00031468 | 0.0 | 0.00 Other | | 0.01086 | | | 0.06 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 10 Dangerous builds not checked Total wall time: 0:00:32
代码
文本
[50]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
cat<<EOF > pt.in
units metal
boundary p p p
atom_style atomic
neighbor 0.0 bin
neigh_modify every 50 delay 0 check no
read_data water.lmp
mass 1 16
mass 2 2
replicate 4 4 4
pair_style deepmd ../se_atten_compressible/frozen_model.pth
pair_coeff * *
velocity all create 330.0 23456789
timestep 0.0005
thermo_style custom step pe ke etotal temp press vol
thermo 20
run 100
run 500
EOF
lmp -in pt.in
[bohrium-156-1225901:03779] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/3188326400/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:03779] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 LAMMPS (29 Aug 2024) OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98) using 1 OpenMP thread(s) per MPI task DeePMD-kit: Successfully load libcudart.so.12 2024-11-24 14:26:27.643570: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 14:26:27.663144: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 14:26:27.669205: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp Reading data file ... triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid reading atoms ... 192 atoms read_data CPU = 0.001 seconds Replication is creating a 4x4x4 = 64 times larger system... triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid 12288 atoms replicate CPU = 0.001 seconds Summary of lammps deepmd module ... >>> Info of deepmd-kit: installed to: /root/deepmd-kit source: source branch: HEAD source commit: b1be266 source commit at: 2024-11-23 01:37:55 -0800 support model ver.: 1.1 build variant: cuda build with tf inc: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include build with tf lib: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2 build with pt lib: torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so set tf intra_op_parallelism_threads: 0 set tf inter_op_parallelism_threads: 0 >>> Info of lammps module: use deepmd-kit at: /root/deepmd-kitload model from: ../se_atten_compressible/frozen_model.pth to gpu 0 DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. >>> Info of model(s): using 1 model(s): ../se_atten_compressible/frozen_model.pth rcut in model: 6 ntypes in model: 2 CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419 - USER-DEEPMD package: The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Neighbor list info ... update: every = 50 steps, delay = 0 steps, check = no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6 ghost atom cutoff = 6 binsize = 3, bins = 17 17 17 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair deepmd, perpetual attributes: full, newton on pair build: full/bin/atomonly stencil: full/bin/3d bin: standard Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 0 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 20 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 40 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 60 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 80 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 100 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 Loop time of 15.9732 on 1 procs for 100 steps with 12288 atoms Performance: 0.270 ns/day, 88.740 hours/ns, 6.260 timesteps/s, 76.929 katom-step/s 64.4% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 15.891 | 15.891 | 15.891 | 0.0 | 99.49 Neigh | 0.065974 | 0.065974 | 0.065974 | 0.0 | 0.41 Comm | 0.011833 | 0.011833 | 0.011833 | 0.0 | 0.07 Output | 0.0005018 | 0.0005018 | 0.0005018 | 0.0 | 0.00 Modify | 8.3108e-05 | 8.3108e-05 | 8.3108e-05 | 0.0 | 0.00 Other | | 0.003758 | | | 0.02 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 2 Dangerous builds not checked WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : metal Current step : 100 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 100 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 120 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 140 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 160 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 180 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 200 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 220 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 240 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 260 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 280 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 300 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 320 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 340 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 360 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 380 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 400 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 420 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 440 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 460 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 480 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 500 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 520 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 540 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 560 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 580 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 600 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 Loop time of 70.5811 on 1 procs for 500 steps with 12288 atoms Performance: 0.306 ns/day, 78.423 hours/ns, 7.084 timesteps/s, 87.049 katom-step/s 58.3% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 70.171 | 70.171 | 70.171 | 0.0 | 99.42 Neigh | 0.32965 | 0.32965 | 0.32965 | 0.0 | 0.47 Comm | 0.058627 | 0.058627 | 0.058627 | 0.0 | 0.08 Output | 0.0024118 | 0.0024118 | 0.0024118 | 0.0 | 0.00 Modify | 0.00041647 | 0.00041647 | 0.00041647 | 0.0 | 0.00 Other | | 0.0185 | | | 0.03 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 10 Dangerous builds not checked Total wall time: 0:01:33
代码
文本
[51]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
cat<<EOF > ptc.in
units metal
boundary p p p
atom_style atomic
neighbor 0.0 bin
neigh_modify every 50 delay 0 check no
read_data water.lmp
mass 1 16
mass 2 2
replicate 4 4 4
pair_style deepmd ../se_atten_compressible/frozen_model_compressed.pth
pair_coeff * *
velocity all create 330.0 23456789
timestep 0.0005
thermo_style custom step pe ke etotal temp press vol
thermo 20
run 100
run 500
EOF
lmp -in ptc.in
[bohrium-156-1225901:03792] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/643170304/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:03792] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 LAMMPS (29 Aug 2024) OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98) using 1 OpenMP thread(s) per MPI task DeePMD-kit: Successfully load libcudart.so.12 2024-11-24 14:28:03.862375: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 14:28:03.881941: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 14:28:03.887974: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp Reading data file ... triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid reading atoms ... 192 atoms read_data CPU = 0.001 seconds Replication is creating a 4x4x4 = 64 times larger system... triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid 12288 atoms replicate CPU = 0.001 seconds Summary of lammps deepmd module ... >>> Info of deepmd-kit: installed to: /root/deepmd-kit source: source branch: HEAD source commit: b1be266 source commit at: 2024-11-23 01:37:55 -0800 support model ver.: 1.1 build variant: cuda build with tf inc: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include build with tf lib: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2 build with pt lib: torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so set tf intra_op_parallelism_threads: 0 set tf inter_op_parallelism_threads: 0 >>> Info of lammps module: use deepmd-kit at: /root/deepmd-kitload model from: ../se_atten_compressible/frozen_model_compressed.pth to gpu 0 DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. >>> Info of model(s): using 1 model(s): ../se_atten_compressible/frozen_model_compressed.pth rcut in model: 6 ntypes in model: 2 CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419 - USER-DEEPMD package: The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Neighbor list info ... update: every = 50 steps, delay = 0 steps, check = no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6 ghost atom cutoff = 6 binsize = 3, bins = 17 17 17 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair deepmd, perpetual attributes: full, newton on pair build: full/bin/atomonly stencil: full/bin/3d bin: standard Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 0 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 20 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 40 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 60 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 80 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 100 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 Loop time of 6.43413 on 1 procs for 100 steps with 12288 atoms Performance: 0.671 ns/day, 35.745 hours/ns, 15.542 timesteps/s, 190.982 katom-step/s 73.6% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 6.3528 | 6.3528 | 6.3528 | 0.0 | 98.74 Neigh | 0.065845 | 0.065845 | 0.065845 | 0.0 | 1.02 Comm | 0.011392 | 0.011392 | 0.011392 | 0.0 | 0.18 Output | 0.00051878 | 0.00051878 | 0.00051878 | 0.0 | 0.01 Modify | 6.772e-05 | 6.772e-05 | 6.772e-05 | 0.0 | 0.00 Other | | 0.003543 | | | 0.06 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 2 Dangerous builds not checked WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : metal Current step : 100 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 100 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 120 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 140 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 160 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 180 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 200 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 220 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 240 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 260 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 280 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 300 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 320 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 340 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 360 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 380 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 400 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 420 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 440 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 460 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 480 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 500 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 520 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 540 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 560 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 580 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 600 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 Loop time of 23.764 on 1 procs for 500 steps with 12288 atoms Performance: 0.909 ns/day, 26.404 hours/ns, 21.040 timesteps/s, 258.542 katom-step/s 65.9% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 23.358 | 23.358 | 23.358 | 0.0 | 98.29 Neigh | 0.33099 | 0.33099 | 0.33099 | 0.0 | 1.39 Comm | 0.055314 | 0.055314 | 0.055314 | 0.0 | 0.23 Output | 0.0024087 | 0.0024087 | 0.0024087 | 0.0 | 0.01 Modify | 0.00037167 | 0.00037167 | 0.00037167 | 0.0 | 0.00 Other | | 0.01714 | | | 0.07 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 10 Dangerous builds not checked Total wall time: 0:00:37
代码
文本
[47]
%%bash
source /root/deepmd-kit/bin/activate /root/deepmd-kit
cat<<EOF > jax.in
units metal
boundary p p p
atom_style atomic
neighbor 0.0 bin
neigh_modify every 50 delay 0 check no
read_data water.lmp
mass 1 16
mass 2 2
replicate 4 4 4
pair_style deepmd ../se_atten_compressible/frozen_model.savedmodel
pair_coeff * *
velocity all create 330.0 23456789
timestep 0.0005
thermo_style custom step pe ke etotal temp press vol
thermo 20
run 100
run 500
EOF
lmp -in jax.in
[bohrium-156-1225901:03478] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1225901.0/jf.0/3465609216/shared_mem_cuda_pool.bohrium-156-1225901 could be created. [bohrium-156-1225901:03478] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 LAMMPS (29 Aug 2024) OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98) using 1 OpenMP thread(s) per MPI task DeePMD-kit: Successfully load libcudart.so.12 2024-11-24 14:22:17.578334: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-24 14:22:17.597960: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-24 14:22:17.604057: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. Loaded 1 plugins from /root/deepmd-kit/lib/deepmd_lmp Reading data file ... triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid reading atoms ... 192 atoms read_data CPU = 0.001 seconds Replication is creating a 4x4x4 = 64 times larger system... triclinic box = (0 0 0) to (49.7788 49.7788 49.7788) with tilt (0 0 0) 1 by 1 by 1 MPI processor grid 12288 atoms replicate CPU = 0.001 seconds Summary of lammps deepmd module ... >>> Info of deepmd-kit: installed to: /root/deepmd-kit source: source branch: HEAD source commit: b1be266 source commit at: 2024-11-23 01:37:55 -0800 support model ver.: 1.1 build variant: cuda build with tf inc: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/include;/root/deepmd-kit/include build with tf lib: /root/deepmd-kit/lib/python3.12/site-packages/tensorflow/libtensorflow_cc.so.2 build with pt lib: torch;torch_library;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1732355244818/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/root/deepmd-kit/lib/libnvrtc.so;/root/deepmd-kit/lib/libnvToolsExt.so;/root/deepmd-kit/lib/libcudart.so;/root/deepmd-kit/lib/python3.12/site-packages/torch/lib/libc10_cuda.so set tf intra_op_parallelism_threads: 0 set tf inter_op_parallelism_threads: 0 >>> Info of lammps module: DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. 2024-11-24 14:22:17.636243: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: ../se_atten_compressible/frozen_model.savedmodel 2024-11-24 14:22:17.669163: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve } 2024-11-24 14:22:17.669196: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: ../se_atten_compressible/frozen_model.savedmodel 2024-11-24 14:22:17.669257: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1732429337.670136 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429337.672073 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429337.672261 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429341.772416 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429341.772625 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429341.772778 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-11-24 14:22:21.772908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29250 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 2024-11-24 14:22:21.918753: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled 2024-11-24 14:22:21.923572: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle. 2024-11-24 14:22:22.005609: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: ../se_atten_compressible/frozen_model.savedmodel 2024-11-24 14:22:22.077173: I tensorflow/cc/saved_model/loader.cc:462] SavedModel load for tags { serve }; Status: success: OK. Took 4440933 microseconds. I0000 00:00:1732429342.134970 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429342.135203 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429342.135324 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429342.135489 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1732429342.135609 3478 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-11-24 14:22:22.135746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29250 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:09.0, compute capability: 7.0 use deepmd-kit at: /root/deepmd-kit >>> Info of model(s): using 1 model(s): ../se_atten_compressible/frozen_model.savedmodel rcut in model: 6 ntypes in model: 2 I0000 00:00:1732429342.378001 3521 service.cc:146] XLA service 0x7fd5f00039c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: I0000 00:00:1732429342.378042 3521 service.cc:154] StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0 2024-11-24 14:22:22.527898: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable. 2024-11-24 14:22:22.576620: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 90300 I0000 00:00:1732429346.413280 3521 device_compiler.h:188] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419 - USER-DEEPMD package: The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Neighbor list info ... update: every = 50 steps, delay = 0 steps, check = no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6 ghost atom cutoff = 6 binsize = 3, bins = 17 17 17 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair deepmd, perpetual attributes: full, newton on pair build: full/bin/atomonly stencil: full/bin/3d bin: standard Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 0 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 20 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 40 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 60 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 80 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 100 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 Loop time of 13.9058 on 1 procs for 100 steps with 12288 atoms Performance: 0.311 ns/day, 77.254 hours/ns, 7.191 timesteps/s, 88.366 katom-step/s 66.0% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 13.821 | 13.821 | 13.821 | 0.0 | 99.39 Neigh | 0.065975 | 0.065975 | 0.065975 | 0.0 | 0.47 Comm | 0.014234 | 0.014234 | 0.014234 | 0.0 | 0.10 Output | 0.00047766 | 0.00047766 | 0.00047766 | 0.0 | 0.00 Modify | 0.00011797 | 0.00011797 | 0.00011797 | 0.0 | 0.00 Other | | 0.004056 | | | 0.03 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 2 Dangerous builds not checked WARNING: No fixes with time integration, atoms won't move (src/verlet.cpp:60) Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule Setting up Verlet run ... Unit style : metal Current step : 100 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 9.061 | 9.061 | 9.061 Mbytes Step PotEng KinEng TotEng Temp Press Volume 100 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 120 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 140 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 160 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 180 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 200 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 220 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 240 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 260 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 280 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 300 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 320 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 340 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 360 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 380 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 400 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 420 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 440 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 460 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 480 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 500 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 520 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 540 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 560 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 580 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 600 -1916323 524.1124 -1915798.8 330 -253197.37 123348.33 Loop time of 69.3794 on 1 procs for 500 steps with 12288 atoms Performance: 0.311 ns/day, 77.088 hours/ns, 7.207 timesteps/s, 88.557 katom-step/s 65.7% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 68.955 | 68.955 | 68.955 | 0.0 | 99.39 Neigh | 0.33016 | 0.33016 | 0.33016 | 0.0 | 0.48 Comm | 0.07073 | 0.07073 | 0.07073 | 0.0 | 0.10 Output | 0.0024476 | 0.0024476 | 0.0024476 | 0.0 | 0.00 Modify | 0.0005643 | 0.0005643 | 0.0005643 | 0.0 | 0.00 Other | | 0.02045 | | | 0.03 Nlocal: 12288 ave 12288 max 12288 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 11142 ave 11142 max 11142 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 1.08749e+06 ave 1.08749e+06 max 1.08749e+06 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 1087488 Ave neighs/atom = 88.5 Neighbor list builds = 10 Dangerous builds not checked Total wall time: 0:01:33
代码
文本
可以看到,不同后端对压缩前和压缩后的模型给出了非常相似的结果。需要注意的是,不同模型、不同体系可能千差万别,此结果不一定对所有情况均适用。
代码
文本
点个赞吧