哥伦布训练营|DPA-1——固态电解质实战之模型训练&性质计算篇
©️ Copyright 2023 @ Authors
作者: 宋哲轩📨
日期:2023-07-20
最近修改:2023-10-27 张琳爽
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
快速开始:你可以点击界面上方蓝色按钮 开始连接 ,选择 `deepmd-kit:2.2.1-cuda11.6-notebook` 镜像及`c12_m46_1 * NVIDIA GPU B`节点配置,同时挂载 `electrolyte-gz V5` 版本的数据集,稍等片刻即可运行。
*本教程需要具备基础知识:DeePMD-kit的使用,如果不熟悉的同学可以优先阅读《快速开始 DeePMD-kit|训练甲烷深度势能分子动力学模型》
🎯 欢迎阅读《哥伦布训练营 | DPA1——固态电解质实战之模型训练篇》
本指南将以结合DPA-1原文arXiv:2208.08236介绍DPA-1模型的研究背景、基本原理,并提供实用的代码示例,帮助理解重要的参数含义; 以固态电解质为例,手把手带你使用文献J. Chem. Phys. 154, 094703 (2021)中的训练集,训练DPA-1势函数模型。
快来和我们一起遇见DPA-1指南,开启势函数探索的新篇章吧!
📐一天,导师要求通过分子动力学模拟对固态电解质进行课题研究——
小A:排除精度不够的传统经典动力学(CMD,classical molecular dynamics)模拟🙅♀️
小B:排除精度很高,但是算不动大体系、长时间的从头算分子动力学(AIMD, ab initio molecular dynamics)🙅♀️
你:自然要结合当下最火热,也是集效率和精度于一身的机器学习势函数分子动力学模拟(MLMD, machine learning molecular dynamics)展开🙌
👍导师:我同意,去做吧!
在彻夜研究机器学习势函数计算后,你不禁思考:机器学习势函数是实用的,但是是否有一个马上拿来能用的模型呢?如果没有,我在训练时,能否利用一些公开的模型作为基础,稍微根据现有数据集调整一下就得到一个可靠的模型呢?
你发现,现有势函数模型/方法存在的缺陷:
- 一些通用模型应用场景局限,化学空间范围小
- 可以通过dpgen等工具获取丰富构象的数据集重新训练得到模型,但是花费较高
至此,你充分认识到:如果有一个大规模预训练模型/势函数就好了,它可以帮助我们省时省钱地得到一个适合应用场景势函数🚀。
📣 近日,深势科技以及北京科学智能研究院研究员张铎、毕航睿等人和合作者在arXiv上预发表了名为《DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation》的文章
通过对元素类型更优的编码以及利用关键的注意力机制,极大提高了Deep Potential之前版本模型的容量和迁移能力,获得了覆盖元素周期表大多常见元素的大型预训练模型。在不同数据集上的迁移学习结果表明,模型能大幅降低新场景对数据的依赖。更多细节可见微信推送和原文。DPA-1的训练和分子动力学模拟功能均已在DeepModeling开源社区DeePMD-kit项目开源。相关工作在深势科技科学计算云平台Bohrium上完成。
现在,你已经知道了DPA-1是一个基于注意力机制的DP模型,它有效地描述了原子间相互作用;与训练后,可以显著减少下游任务地额外工作。
你只需要通过一篇指南快速掌握训练DPA-1势函数的方法(dpa_从头训),以及如何基于一个已有的大模型,根据现有数据集进行微调(dpa_finetune)得到势函数。和利用该势函数展开实际固态电解质场景下的计算模拟。
1. 学习目标
在本教程学习后,你将获得:
- DPA-1基本原理和应用;
- 以固态电解质为例,进行DPA-1势函数模型训练实战:输入脚本解读;从头训 vs. 已有预训练模型微调;模型评估测试
- 以固态电解质为例,结合分子动力学软件LAMMPS了解分子动力学是如何工作的。
- 以固态电解质为例,计算并绘制体系的径向分布图像(RDF)。
- 以固态电解质为例,计算体系的离子扩散系数D,并绘制均方位移(MSD)。
2. DPA-1简介
👂 迫不及待动手实践?可直接跳转至第3节~
2.1 研究背景
一直以来,势函数训练都在追求精度和效率的平衡。使用经典力场势函数方便快捷,但模拟精度难以更上一层楼;使用近来火热的AIMD(从头算分子动力学),势函数精度获得大幅提升,但计算资源花费难以在大体系、长时间的场景落地。随着AI for science的发展,机器学习手段使得训练高精度、高效率的势函数成为可能(图 1. 分子动力学模拟对比)。在MLMD的新范式下,量子化学计算(QM)不再直接应用于AIMD,而是作为生成机器学习势函数(MLP)的数据集准备; 当然,AIMD的计算结果也可以作为初始数据集。
然而,由于现有模型迁移能力不足、缺乏通用的大模型,面对一个新的复杂体系,要获得可用的、较为完备的势函数模型,科学家们基本上仍然需要获取大量计算数据并从头开始训练模型。随着电子结构数据的积累,类比计算机视觉(CV)或者自然语言处理(NLP)等其他人工智能领域的发展,**“预训练+少量数据微调”**是解决这个难题比较自然的想法。
为了实现这一范式,我们亟需一种具有强迁移能力、能容纳元素周期表大多数元素的模型结构。
2.2 研究方法
DPA-1模型是基于DP系列模型的一次全面升级,利用关键的门控注意力机制(Gated Attention Machanism),对原子间的相互作用实现了更为充分的建模,通过在现有数据上的训练,能够学习到更多隐藏的原子交互信息,极大提升了模型在包含不同构象、不同组分的数据集之间的迁移能力,从而也提升了在数据生成时的采样效率;并且模型通过对元素信息的编码,拓展了对元素的容量。开发者将模型在含有56种元素的较大数据集上进行了预训练,并将此预训练模型在各种下游任务上进行了迁移学习,实验表明,此预训练模型能大幅降低下游任务训练所需数据量及训练成本、提高模型预测精度,从而对分子模拟相关领域产生深远的影响(图2 DPA-1模型结构示意图)。
相比于之前的dp模型,DPA-1模型在研究方法上做了如下调整:
- 描述符:【元素类型编码】新增了原子类型作为具备嵌入矩阵的输入;引入【注意力机制】,根据原子的距离和角度重新加权得到原子之间的相互作用
- 损失函数的计算调整:为了使用新的数据集对预训练模型进行微调,首先使用新数据集的新统计结果改变预训练模型的能量偏差,然后修复预训练模型的部分参数并训练剩余参数
DPA-1在推理方面延续了DP系列模型的高效率,可以进行大规模原子、元素体系的分子动力学模拟。
2.3 实验验证
迁移性测试
- 三元合金数据集
- 固态电解质(SSE)数据集
- 高熵合金(HEA)数据集
注:OC20数据集:由物理结合到催化剂表面的单一吸附物(小分子)组成,催化剂表面覆盖有 56 种元素的周期性体相材料
为了测试DPA-1模型结构带来的迁移能力提升,研究者人为将不同训练集划分成了多个子集,每个子集之间的组分、构型有较大差异(以AlMgCu为例,single子集中仅包含单质数据;binary仅有二元数据,即Al-Mg,Al-Cu,Mg-Cu;而ternary则是剩余的三元数据)研究者在其中一些子集上训练,在另一些子集上进行测试,来考验模型在极端条件下的迁移能力。 (图3. DeepPot-SE 和 DPA-1 在不同设置和不同系统上的能量和力的学习曲线)。
可以看到,对比DeepPot-SE,在某些条件下DPA-1的测试精度甚至能实现一两个数量级的提升,这说明模型可以从现有数据中学习到隐含的原子间交互信息,也进一步证明了模型强大的迁移能力。
样本效率测试:案例场景(图4. 模型样本效率表现) DPA-1在仅有少量三元数据的场景下,也达到了较高的精度,对比DeepPot-SE可以节省大约90%的三元数据。
2.4 模型可解释性
为了进一步研究此覆盖元素周期表大多数元素的预训练模型的可解释性,研究者将模型中学习到的元素编码进行了PCA降维并可视化,如图5所示:
所有的元素在隐空间中呈螺旋状分布,同周期元素沿着螺旋下降,同族元素则垂直螺旋方向分布,巧妙地对应了其在元素周期表中的位置,也很好地证明了模型的可解释性。
2.5 未来展望
DPA-1的提出为机器学习势能函数生产打开了新的范式,证明了“预训练+少量任务微调”流程的可行性,未来研究者将继续致力于势能函数自动化生产、自动化测试,也会继续关注比如多任务训练、无监督学习、模型压缩、蒸馏等操作,方便用户一键生成下游任务所需的势能函数。此外,更大更全的数据库、下游任务与dflow工作流框架的结合也是未来极具发展性的方向。
3. 固态电解质实战: DPA-1势函数训练
学习了理论知识后,让我们直接动手实践吧! 本节,我们将以固态电解质数据集LiGePS-SSE-PBE为例,开展DPA-1的从头训和微调训练。
注:本教程所使用的数据集源自科学智能广场(AIS-Square),有更多模型和数据需求的同学赶快去探索一下吧~
3.1 数据集下载
. ├── DeePMD_SSE │ ├── 1000K │ ├── 400K │ ├── input_dpa.json │ └── job_dpa.json ├── DeePMD_SSE_done │ ├── 1000K │ ├── 400K │ ├── input_dpa.json │ └── job_dpa.json ├── LICENSE ├── dpa_dataset │ ├── dpa │ └── dpa_finetune └── dpa_dataset_done ├── dpa └── dpa_finetune 12 directories, 5 files /data/study_examples/dpa_dataset/dpa ├── input.json ├── iter.000000 ├── iter.000001 └── iter.000002 3 directories, 1 file
3.2 输入脚本准备(从头训)
{ "model": { "descriptor": { "type": "se_atten", "sel": 60, "rcut_smth": 0.5, "rcut": 6.0, "neuron": [ 25, 50, 100 ], "resnet_dt": false, "axis_neuron": 16, "attn": 128, "attn_layer": 2, "attn_dotr": true, "attn_mask": false, "seed": 1801819940, "_activation_function": "tanh" }, "fitting_net": { "neuron": [ 240, 240, 240 ], "resnet_dt": true, "_coord_norm": true, "_type_fitting_net": false, "seed": 2375417769, "_activation_function": "tanh" }, "type_map": [ "Li", "Ge", "P", "S" ] }, "learning_rate": { "type": "exp", "start_lr": 0.001, "decay_steps": 50, "stop_lr": 3.51e-08 }, "loss": { "start_pref_e": 0.02, "limit_pref_e": 1, "start_pref_f": 1000, "limit_pref_f": 1, "start_pref_v": 0, "limit_pref_v": 0 }, "training": { "training_data": { "systems": [ "iter.000000/02.fp/data.000", "iter.000000/02.fp/data.001", "iter.000000/02.fp/data.002", "iter.000000/02.fp/data.003" ], "batch_size": 1 }, "validation_data": { "systems": [ "iter.000001/02.fp/data.000", "iter.000001/02.fp/data.001", "iter.000001/02.fp/data.002", "iter.000001/02.fp/data.003", "iter.000002/02.fp/data.000", "iter.000002/02.fp/data.001", "iter.000002/02.fp/data.002", "iter.000002/02.fp/data.003" ], "batch_size": 1 }, "numb_steps": 100, "seed": 3982377700, "_comment": "that's all", "disp_file": "lcurve.out", "disp_freq": 10, "numb_test": 1, "save_freq": 50, "save_ckpt": "model.ckpt", "disp_training": true, "time_training": true, "profiling": false, "profiling_file": "timeline.json" } }
相比于dp_se_e2模型,dpa使用了se_atten作为描述子,主要修改的参数集中在descriptor
部分
"descriptor": {
"type": "se_atten",
"rcut_smth": 0.5,
"rcut": 6.0,
"sel": 60,
"neuron": [25,50,100],
"axis_neuron": 16,
"resnet_dt": false,
"attn": 128,
"attn_layer": 2,
"attn_mask": false,
"attn_dotr": true,
"seed": 1801819940,
"_activation_function": "tanh"
},
相比于之前大家常用的 se_e2_a 描述子来说,有以下几个参数有区别:
type:"se_atten"
:表示采用DPA-1描述子结构;rcut
:邻近列表的截断半径,rcut_smth
:平滑起点;sel
:纳入考虑的最大邻居原子个数总和,这个值和DPA-1的训练效率高度相关,一般我们不用设置太大,推荐最大不超过200。DPA-1论文中训练含有56种元素的OC2M数据集,也只用了120就已经足够了;与dp_se_e2中输入的list型不同,这里sel
为int类型;neuron
:指定嵌入网络的大小axis_neuron
:嵌入矩阵的子矩阵大小,即DeepPot-SE论文中的the axis matrixresnet_dt
:若选项设置为true,则在ResNet中使用时间步长seed
:初始化模型参数时用于生成随机数的随机种子
除此以外,还新增了一些attention机制相关的参数:
attn"
:attention过程中的隐向量长度;attn_layer
:代表总共进行几层attention过程,一般我们推荐2层就可以;attn_mask
:代表是否将attention权重的对角线mask掉;attn_dotr
:代表是否对attention权重点乘相对坐标的乘积,类似一个门控注意力机制(Gated Attention Mechanism) 其他参数和大家常用的“se_e2_a”描述子中代表的含义保持一致,大家可以参考这里来获得更详细的解释。
其他注意事项:
对于模型的其他部分,首先DPA-1仅支持使用“ener”类型的拟合(fitting)网络,可以参考标准拟合网络的参数设置。
其次,对于DPA-1来说,会默认启用元素类型编码(type embedding)来编码元素相关的信息,扩大模型对元素类型的容量,默认参数如下:
"type_embedding":{
"neuron": [2, 4, 8],
"resnet_dt": false,
"seed": 1
},
其中的参数含义和标准元素编码参数保持一致,如果想修改这些默认参数,可以把上述参数手动加到type_embedding
进行自定义。
DPA-1非常适用于包含多种元素的体系,尤其是有十种以上元素的体系,这时候需要手动添加每种元素编号对应的元素种类,即type_map
参数:
"type_map": [
"Li",
"Ge",
"P",
"S"
]
3.3 模型训练(从头训)
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/compat.py:358: UserWarning: The argument training->numb_test has been deprecated since v2.0.0. Use training->validation_data->batch_size instead. warnings.warn( DEEPMD INFO Calculate neighbor statistics... (add --skip-neighbor-stat to skip this step) OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-11 OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #157: KMP_AFFINITY: 12 available OS procs OMP: Info #158: KMP_AFFINITY: Uniform topology OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "core". OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core". OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core". OMP: Info #192: KMP_AFFINITY: 1 socket x 6 cores/socket x 2 threads/core (6 total cores) OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 4 maps to socket 0 core 2 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 5 maps to socket 0 core 2 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 6 maps to socket 0 core 3 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 7 maps to socket 0 core 3 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 8 maps to socket 0 core 4 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 9 maps to socket 0 core 4 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 10 maps to socket 0 core 5 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 11 maps to socket 0 core 5 thread 1 OMP: Info #254: KMP_AFFINITY: pid 942 tid 984 thread 1 bound to OS proc set 2 OMP: Info #254: KMP_AFFINITY: pid 942 tid 986 thread 2 bound to OS proc set 4 OMP: Info #254: KMP_AFFINITY: pid 942 tid 987 thread 3 bound to OS proc set 6 OMP: Info #254: KMP_AFFINITY: pid 942 tid 989 thread 5 bound to OS proc set 10 OMP: Info #254: KMP_AFFINITY: pid 942 tid 990 thread 6 bound to OS proc set 1 OMP: Info #254: KMP_AFFINITY: pid 942 tid 988 thread 4 bound to OS proc set 8 OMP: Info #254: KMP_AFFINITY: pid 942 tid 991 thread 7 bound to OS proc set 3 OMP: Info #254: KMP_AFFINITY: pid 942 tid 992 thread 8 bound to OS proc set 5 OMP: Info #254: KMP_AFFINITY: pid 942 tid 993 thread 9 bound to OS proc set 7 OMP: Info #254: KMP_AFFINITY: pid 942 tid 994 thread 10 bound to OS proc set 9 OMP: Info #254: KMP_AFFINITY: pid 942 tid 995 thread 11 bound to OS proc set 11 OMP: Info #254: KMP_AFFINITY: pid 942 tid 996 thread 12 bound to OS proc set 0 OMP: Info #254: KMP_AFFINITY: pid 942 tid 983 thread 13 bound to OS proc set 2 OMP: Info #254: KMP_AFFINITY: pid 942 tid 997 thread 14 bound to OS proc set 4 OMP: Info #254: KMP_AFFINITY: pid 942 tid 999 thread 16 bound to OS proc set 8 OMP: Info #254: KMP_AFFINITY: pid 942 tid 998 thread 15 bound to OS proc set 6 OMP: Info #254: KMP_AFFINITY: pid 942 tid 1000 thread 17 bound to OS proc set 10 OMP: Info #254: KMP_AFFINITY: pid 942 tid 1001 thread 18 bound to OS proc set 1 OMP: Info #254: KMP_AFFINITY: pid 942 tid 1002 thread 19 bound to OS proc set 3 OMP: Info #254: KMP_AFFINITY: pid 942 tid 1003 thread 20 bound to OS proc set 5 OMP: Info #254: KMP_AFFINITY: pid 942 tid 1004 thread 21 bound to OS proc set 7 OMP: Info #254: KMP_AFFINITY: pid 942 tid 1005 thread 22 bound to OS proc set 9 OMP: Info #254: KMP_AFFINITY: pid 942 tid 1006 thread 23 bound to OS proc set 11 OMP: Info #254: KMP_AFFINITY: pid 942 tid 1007 thread 24 bound to OS proc set 0 DEEPMD INFO training data with min nbor dist: 1.7120899465949608 DEEPMD INFO training data with max nbor size: [56] DEEPMD INFO _____ _____ __ __ _____ _ _ _ DEEPMD INFO | __ \ | __ \ | \/ || __ \ | | (_)| | DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |_ DEEPMD INFO | | | | / _ \ / _ \| ___/ | |\/| || | | ||______|| |/ /| || __| DEEPMD INFO | |__| || __/| __/| | | | | || |__| | | < | || |_ DEEPMD INFO |_____/ \___| \___||_| |_| |_||_____/ |_|\_\|_| \__| DEEPMD INFO Please read and cite: DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018) DEEPMD INFO installed to: /home/conda/feedstock_root/build_artifacts/deepmd-kit_1678943793317/work/_skbuild/linux-x86_64-3.10/cmake-install DEEPMD INFO source : v2.2.1 DEEPMD INFO source brach: HEAD DEEPMD INFO source commit: 3ac8c4c7 DEEPMD INFO source commit at: 2023-03-16 12:33:24 +0800 DEEPMD INFO build float prec: double DEEPMD INFO build variant: cuda DEEPMD INFO build with tf inc: /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/include;/opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/../../../../include DEEPMD INFO build with tf lib: DEEPMD INFO ---Summary of the training--------------------------------------- DEEPMD INFO running on: bohrium-14923-1052604 DEEPMD INFO computing device: gpu:0 DEEPMD INFO CUDA_VISIBLE_DEVICES: unset DEEPMD INFO Count of visible GPU: 1 DEEPMD INFO num_intra_threads: 0 DEEPMD INFO num_inter_threads: 0 DEEPMD INFO ----------------------------------------------------------------- DEEPMD INFO ---Summary of DataSystem: training ----------------------------------------------- DEEPMD INFO found 4 system(s): DEEPMD INFO system natoms bch_sz n_bch prob pbc DEEPMD INFO iter.000000/02.fp/data.000 400 1 127 0.241 T DEEPMD INFO iter.000000/02.fp/data.001 400 1 131 0.248 T DEEPMD INFO iter.000000/02.fp/data.002 400 1 133 0.252 T DEEPMD INFO iter.000000/02.fp/data.003 400 1 137 0.259 T DEEPMD INFO -------------------------------------------------------------------------------------- DEEPMD INFO ---Summary of DataSystem: validation ----------------------------------------------- DEEPMD INFO found 8 system(s): DEEPMD INFO system natoms bch_sz n_bch prob pbc DEEPMD INFO iter.000001/02.fp/data.000 400 1 117 0.111 T DEEPMD INFO iter.000001/02.fp/data.001 400 1 137 0.129 T DEEPMD INFO iter.000001/02.fp/data.002 400 1 137 0.129 T DEEPMD INFO iter.000001/02.fp/data.003 400 1 138 0.130 T DEEPMD INFO iter.000002/02.fp/data.000 400 1 133 0.126 T DEEPMD INFO iter.000002/02.fp/data.001 400 1 133 0.126 T DEEPMD INFO iter.000002/02.fp/data.002 400 1 134 0.127 T DEEPMD INFO iter.000002/02.fp/data.003 400 1 129 0.122 T DEEPMD INFO -------------------------------------------------------------------------------------- DEEPMD INFO training without frame parameter DEEPMD INFO data stating... (this step may take long time) OMP: Info #254: KMP_AFFINITY: pid 942 tid 942 thread 0 bound to OS proc set 0 DEEPMD INFO built lr DEEPMD INFO built network DEEPMD INFO built training WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. DEEPMD INFO initialize model from scratch DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 50, decay_rate 0.005925, final lr will be 3.51e-08 DEEPMD INFO batch 10 training time 3.86 s, testing time 0.13 s DEEPMD INFO batch 20 training time 1.92 s, testing time 0.13 s DEEPMD INFO batch 30 training time 1.91 s, testing time 0.13 s DEEPMD INFO batch 40 training time 1.92 s, testing time 0.14 s DEEPMD INFO batch 50 training time 1.92 s, testing time 0.13 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 60 training time 1.92 s, testing time 0.13 s DEEPMD INFO batch 70 training time 1.92 s, testing time 0.13 s DEEPMD INFO batch 80 training time 1.92 s, testing time 0.13 s DEEPMD INFO batch 90 training time 1.92 s, testing time 0.13 s DEEPMD INFO batch 100 training time 1.93 s, testing time 0.13 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO average training time: 0.1919 s/batch (exclude first 10 batches) DEEPMD INFO finished training DEEPMD INFO wall time: 25.499 s WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) DEEPMD INFO The following nodes will be frozen: ['model_type', 'descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'train_attr/min_nbor_dist', 'train_attr/training_script', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam'] WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.convert_variables_to_constants` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.convert_variables_to_constants` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.extract_sub_graph` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.extract_sub_graph` DEEPMD INFO 1352 ops in the final graph.
3.4 模型微调训练
{ "model": { "type_embedding":{"trainable": true}, "descriptor": {"trainable": true}, "fitting_net": {"trainable": true}, "type_map": [ "Li", "Ge", "P", "S" ] }, "learning_rate": { "type": "exp", "start_lr": 0.001, "decay_steps": 50, "stop_lr": 3.51e-08 }, "loss": { "type": "ener", "start_pref_e": 0.02, "limit_pref_e": 1, "start_pref_f": 1000, "limit_pref_f": 1, "start_pref_v": 0, "limit_pref_v": 0 }, "training": { "training_data": { "systems": [ "iter.000000/02.fp/data.000", "iter.000000/02.fp/data.001", "iter.000000/02.fp/data.002", "iter.000000/02.fp/data.003" ], "batch_size": 1 }, "validation_data": { "systems": [ "iter.000001/02.fp/data.000", "iter.000001/02.fp/data.001", "iter.000001/02.fp/data.002", "iter.000001/02.fp/data.003", "iter.000002/02.fp/data.000", "iter.000002/02.fp/data.001", "iter.000002/02.fp/data.002", "iter.000002/02.fp/data.003" ], "batch_size": 1 }, "numb_steps": 100, "seed": 3982377700, "_comment": "that's all", "disp_file": "lcurve.out", "disp_freq": 10, "numb_test": 1, "save_freq": 50, "save_ckpt": "model.ckpt", "disp_training": true, "time_training": true, "profiling": false, "profiling_file": "timeline.json" } }
在微调训练部分的输入文件中,我们只需将type_embedding
、descriptor
、fitting_net
参数改为trainable即可,无需重复书写
"model": {
"type_embedding":{"trainable": true},
"descriptor": {"trainable": true},
"fitting_net": {"trainable": true},
在本教程中,我们简单使用刚才已经训练好的dpa.pb作为预训练模型示例,进行微调训练
微调训练的命令增加了--finetune dpa.pb
选项
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) DEEPMD INFO Change the model configurations according to the pretrained one... DEEPMD INFO Change the 'descriptor' from {'trainable': True} to {'type': 'se_atten', 'sel': 60, 'rcut_smth': 0.5, 'rcut': 6.0, 'neuron': [25, 50, 100], 'resnet_dt': False, 'axis_neuron': 16, 'attn': 128, 'attn_layer': 2, 'attn_dotr': True, 'attn_mask': False, 'seed': 1801819940, 'activation_function': 'tanh', 'type_one_side': False, 'precision': 'default', 'trainable': True, 'exclude_types': [], 'set_davg_zero': True}. DEEPMD INFO Change the 'fitting_net' from {'trainable': True} to {'neuron': [240, 240, 240], 'resnet_dt': True, 'seed': 2375417769, 'type': 'ener', 'numb_fparam': 0, 'numb_aparam': 0, 'activation_function': 'tanh', 'precision': 'default', 'trainable': True, 'rcond': 0.001, 'atom_ener': [], 'use_aparam_as_mask': False}. /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/compat.py:358: UserWarning: The argument training->numb_test has been deprecated since v2.0.0. Use training->validation_data->batch_size instead. warnings.warn( DEEPMD INFO Calculate neighbor statistics... (add --skip-neighbor-stat to skip this step) OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-11 OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #157: KMP_AFFINITY: 12 available OS procs OMP: Info #158: KMP_AFFINITY: Uniform topology OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "core". OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core". OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core". OMP: Info #192: KMP_AFFINITY: 1 socket x 6 cores/socket x 2 threads/core (6 total cores) OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 4 maps to socket 0 core 2 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 5 maps to socket 0 core 2 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 6 maps to socket 0 core 3 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 7 maps to socket 0 core 3 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 8 maps to socket 0 core 4 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 9 maps to socket 0 core 4 thread 1 OMP: Info #172: KMP_AFFINITY: OS proc 10 maps to socket 0 core 5 thread 0 OMP: Info #172: KMP_AFFINITY: OS proc 11 maps to socket 0 core 5 thread 1 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1079 thread 1 bound to OS proc set 2 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1082 thread 2 bound to OS proc set 4 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1083 thread 3 bound to OS proc set 6 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1084 thread 4 bound to OS proc set 8 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1085 thread 5 bound to OS proc set 10 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1086 thread 6 bound to OS proc set 1 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1087 thread 7 bound to OS proc set 3 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1088 thread 8 bound to OS proc set 5 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1089 thread 9 bound to OS proc set 7 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1090 thread 10 bound to OS proc set 9 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1091 thread 11 bound to OS proc set 11 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1092 thread 12 bound to OS proc set 0 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1080 thread 13 bound to OS proc set 2 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1093 thread 14 bound to OS proc set 4 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1094 thread 15 bound to OS proc set 6 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1095 thread 16 bound to OS proc set 8 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1096 thread 17 bound to OS proc set 10 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1097 thread 18 bound to OS proc set 1 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1098 thread 19 bound to OS proc set 3 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1099 thread 20 bound to OS proc set 5 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1100 thread 21 bound to OS proc set 7 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1101 thread 22 bound to OS proc set 9 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1102 thread 23 bound to OS proc set 11 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1103 thread 24 bound to OS proc set 0 DEEPMD INFO training data with min nbor dist: 1.7120899465949608 DEEPMD INFO training data with max nbor size: [56] DEEPMD INFO _____ _____ __ __ _____ _ _ _ DEEPMD INFO | __ \ | __ \ | \/ || __ \ | | (_)| | DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |_ DEEPMD INFO | | | | / _ \ / _ \| ___/ | |\/| || | | ||______|| |/ /| || __| DEEPMD INFO | |__| || __/| __/| | | | | || |__| | | < | || |_ DEEPMD INFO |_____/ \___| \___||_| |_| |_||_____/ |_|\_\|_| \__| DEEPMD INFO Please read and cite: DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018) DEEPMD INFO installed to: /home/conda/feedstock_root/build_artifacts/deepmd-kit_1678943793317/work/_skbuild/linux-x86_64-3.10/cmake-install DEEPMD INFO source : v2.2.1 DEEPMD INFO source brach: HEAD DEEPMD INFO source commit: 3ac8c4c7 DEEPMD INFO source commit at: 2023-03-16 12:33:24 +0800 DEEPMD INFO build float prec: double DEEPMD INFO build variant: cuda DEEPMD INFO build with tf inc: /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/include;/opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/../../../../include DEEPMD INFO build with tf lib: DEEPMD INFO ---Summary of the training--------------------------------------- DEEPMD INFO running on: bohrium-14923-1052604 DEEPMD INFO computing device: gpu:0 DEEPMD INFO CUDA_VISIBLE_DEVICES: unset DEEPMD INFO Count of visible GPU: 1 DEEPMD INFO num_intra_threads: 0 DEEPMD INFO num_inter_threads: 0 DEEPMD INFO ----------------------------------------------------------------- DEEPMD INFO ---Summary of DataSystem: training ----------------------------------------------- DEEPMD INFO found 4 system(s): DEEPMD INFO system natoms bch_sz n_bch prob pbc DEEPMD INFO iter.000000/02.fp/data.000 400 1 127 0.241 T DEEPMD INFO iter.000000/02.fp/data.001 400 1 131 0.248 T DEEPMD INFO iter.000000/02.fp/data.002 400 1 133 0.252 T DEEPMD INFO iter.000000/02.fp/data.003 400 1 137 0.259 T DEEPMD INFO -------------------------------------------------------------------------------------- DEEPMD INFO ---Summary of DataSystem: validation ----------------------------------------------- DEEPMD INFO found 8 system(s): DEEPMD INFO system natoms bch_sz n_bch prob pbc DEEPMD INFO iter.000001/02.fp/data.000 400 1 117 0.111 T DEEPMD INFO iter.000001/02.fp/data.001 400 1 137 0.129 T DEEPMD INFO iter.000001/02.fp/data.002 400 1 137 0.129 T DEEPMD INFO iter.000001/02.fp/data.003 400 1 138 0.130 T DEEPMD INFO iter.000002/02.fp/data.000 400 1 133 0.126 T DEEPMD INFO iter.000002/02.fp/data.001 400 1 133 0.126 T DEEPMD INFO iter.000002/02.fp/data.002 400 1 134 0.127 T DEEPMD INFO iter.000002/02.fp/data.003 400 1 129 0.122 T DEEPMD INFO -------------------------------------------------------------------------------------- DEEPMD INFO training without frame parameter DEEPMD INFO Changing energy bias in pretrained model for types ['Li', 'Ge', 'P', 'S']... (this step may take long time) WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. DEEPMD INFO Adjust batch size from 1024 to 2048 DEEPMD INFO Adjust batch size from 2048 to 4096 DEEPMD INFO Adjust batch size from 4096 to 8192 OMP: Info #254: KMP_AFFINITY: pid 1038 tid 1038 thread 0 bound to OS proc set 0 DEEPMD INFO RMSE of atomic energy after linear regression is: 0.02139649655185475 eV/atom. DEEPMD INFO Change energy bias of ['Li', 'Ge', 'P', 'S'] from [-4.17483491 -0.41748349 -0.83496698 -5.00980189] to [-4.17723096 -0.4177231 -0.83544619 -5.01267715]. DEEPMD INFO built lr DEEPMD INFO built network DEEPMD INFO built training WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. DEEPMD INFO initialize training from the frozen pretrained model DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 50, decay_rate 0.005925, final lr will be 3.51e-08 DEEPMD INFO batch 10 training time 3.85 s, testing time 0.13 s DEEPMD INFO batch 20 training time 1.92 s, testing time 0.13 s DEEPMD INFO batch 30 training time 1.93 s, testing time 0.13 s DEEPMD INFO batch 40 training time 1.93 s, testing time 0.13 s DEEPMD INFO batch 50 training time 1.93 s, testing time 0.13 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO batch 60 training time 1.94 s, testing time 0.13 s DEEPMD INFO batch 70 training time 1.94 s, testing time 0.13 s DEEPMD INFO batch 80 training time 1.93 s, testing time 0.13 s DEEPMD INFO batch 90 training time 1.93 s, testing time 0.13 s DEEPMD INFO batch 100 training time 1.94 s, testing time 0.13 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO average training time: 0.1932 s/batch (exclude first 10 batches) DEEPMD INFO finished training DEEPMD INFO wall time: 24.724 s WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) DEEPMD INFO The following nodes will be frozen: ['model_type', 'descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'train_attr/min_nbor_dist', 'train_attr/training_script', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam'] WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.convert_variables_to_constants` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:354: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.convert_variables_to_constants` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.extract_sub_graph` WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.extract_sub_graph` DEEPMD INFO 1354 ops in the final graph.
一个基于已有预训练模型的微调势函数就训练好了!总结一下,它与普通的模型训练差异在于:
- input.json的参数设置
- 命令行增加
--fintune pretained.pb
3.5 模型检验
现在,我们已经完成了固态电解质dpa势函数的从头训和微调训练。让我们通过lcurve(学习率变化曲线输出文件)和dp test功能检验下模型的表现吧!
📌注:由于节省时间,教程中的训练步仅为100步,没有到达一个收敛的结果;在真实训练场景中,记得调整至合适的训练时长~
因此,此次将已经计算完成的/dpa_dataset_done文件夹中dpa和dpa-fintune版本的训练步数为10000步的势函数用于计算结果分析:
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr 0 2.77e+01 2.70e+01 2.11e+00 2.13e+00 8.55e-01 8.32e-01 1.0e-03 100 1.84e+01 1.60e+01 2.27e-01 2.07e-01 6.10e-01 5.30e-01 9.0e-04 200 1.03e+01 9.48e+00 3.84e-02 4.17e-02 3.62e-01 3.32e-01 8.1e-04 300 6.73e+00 5.87e+00 2.05e-02 3.72e-02 2.48e-01 2.16e-01 7.4e-04 400 6.47e+00 5.56e+00 1.41e-02 2.68e-02 2.51e-01 2.16e-01 6.6e-04 500 5.70e+00 5.34e+00 1.05e-01 1.03e-01 2.26e-01 2.11e-01 6.0e-04 600 5.35e+00 4.65e+00 1.37e-02 1.47e-03 2.30e-01 2.00e-01 5.4e-04 700 3.66e+00 4.01e+00 4.89e-03 1.92e-02 1.65e-01 1.81e-01 4.9e-04 800 4.18e+00 3.87e+00 2.61e-02 1.32e-02 1.98e-01 1.84e-01 4.4e-04 900 3.86e+00 4.40e+00 1.24e-02 1.69e-02 1.93e-01 2.20e-01 4.0e-04 1000 3.93e+00 3.38e+00 3.48e-03 2.31e-04 2.07e-01 1.78e-01 3.6e-04 1100 3.11e+00 3.40e+00 4.19e-02 3.90e-02 1.69e-01 1.85e-01 3.2e-04 1200 2.91e+00 3.10e+00 8.41e-04 1.30e-02 1.70e-01 1.81e-01 2.9e-04 1300 3.04e+00 3.00e+00 5.49e-02 6.85e-02 1.78e-01 1.70e-01 2.6e-04 1400 2.85e+00 2.86e+00 2.31e-02 3.11e-02 1.83e-01 1.82e-01 2.4e-04 1500 2.65e+00 2.35e+00 7.19e-03 1.67e-02 1.80e-01 1.59e-01 2.1e-04 1600 2.56e+00 2.36e+00 1.97e-02 1.81e-02 1.82e-01 1.68e-01 1.9e-04 1700 2.17e+00 2.23e+00 9.60e-03 6.47e-03 1.63e-01 1.68e-01 1.7e-04 1800 1.88e+00 2.32e+00 4.03e-02 2.55e-02 1.37e-01 1.80e-01 1.6e-04 1900 2.07e+00 2.00e+00 1.69e-02 1.67e-02 1.71e-01 1.65e-01 1.4e-04 2000 1.80e+00 1.70e+00 9.92e-03 1.58e-03 1.57e-01 1.49e-01 1.3e-04 2100 1.77e+00 1.69e+00 9.73e-03 1.04e-02 1.62e-01 1.55e-01 1.2e-04 2200 1.73e+00 1.78e+00 1.36e-02 1.87e-02 1.67e-01 1.70e-01 1.0e-04 2300 1.65e+00 1.57e+00 3.43e-03 2.75e-03 1.69e-01 1.61e-01 9.4e-05 2400 1.47e+00 1.48e+00 1.37e-02 3.84e-03 1.56e-01 1.59e-01 8.5e-05 2500 1.64e+00 1.53e+00 9.75e-03 1.40e-02 1.84e-01 1.71e-01 7.7e-05 2600 1.33e+00 1.35e+00 4.58e-03 2.85e-03 1.59e-01 1.61e-01 6.9e-05 2700 1.38e+00 1.17e+00 1.84e-02 1.80e-02 1.67e-01 1.40e-01 6.3e-05 2800 1.27e+00 1.29e+00 1.27e-02 1.31e-02 1.64e-01 1.67e-01 5.7e-05 2900 1.13e+00 1.35e+00 9.52e-03 1.00e-02 1.54e-01 1.85e-01 5.1e-05 3000 1.15e+00 1.17e+00 1.32e-03 4.65e-03 1.68e-01 1.70e-01 4.6e-05 3100 1.02e+00 1.10e+00 1.02e-02 1.51e-02 1.53e-01 1.62e-01 4.2e-05 3200 1.01e+00 9.83e-01 3.27e-03 3.68e-03 1.62e-01 1.58e-01 3.8e-05 3300 1.10e+00 9.74e-01 5.40e-03 7.01e-03 1.86e-01 1.63e-01 3.4e-05 3400 9.97e-01 9.72e-01 2.06e-02 1.52e-02 1.62e-01 1.65e-01 3.1e-05 3500 7.98e-01 8.50e-01 3.38e-03 3.24e-03 1.49e-01 1.59e-01 2.8e-05 3600 9.03e-01 8.77e-01 7.51e-03 5.28e-03 1.75e-01 1.71e-01 2.5e-05 3700 7.84e-01 7.79e-01 4.97e-03 1.38e-02 1.61e-01 1.51e-01 2.2e-05 3800 8.57e-01 7.19e-01 2.55e-03 3.76e-03 1.86e-01 1.55e-01 2.0e-05 3900 7.66e-01 7.34e-01 6.32e-03 3.05e-03 1.72e-01 1.66e-01 1.8e-05 4000 7.01e-01 7.30e-01 2.80e-03 3.60e-03 1.67e-01 1.74e-01 1.7e-05 4100 6.68e-01 6.58e-01 3.87e-03 6.01e-03 1.66e-01 1.62e-01 1.5e-05 4200 6.33e-01 5.37e-01 4.60e-03 2.03e-03 1.65e-01 1.41e-01 1.3e-05 4300 6.43e-01 6.46e-01 3.32e-04 7.16e-03 1.77e-01 1.74e-01 1.2e-05 4400 6.01e-01 5.73e-01 6.53e-03 1.68e-03 1.70e-01 1.66e-01 1.1e-05 4500 5.48e-01 5.39e-01 5.67e-04 2.45e-03 1.66e-01 1.63e-01 9.9e-06 4600 5.43e-01 4.52e-01 1.52e-03 8.33e-04 1.72e-01 1.43e-01 8.9e-06 4700 4.85e-01 5.17e-01 4.79e-03 6.52e-03 1.58e-01 1.66e-01 8.1e-06 4800 5.15e-01 4.83e-01 4.06e-03 3.78e-04 1.77e-01 1.68e-01 7.3e-06 4900 4.81e-01 5.12e-01 9.57e-03 9.69e-03 1.61e-01 1.72e-01 6.6e-06 5000 4.61e-01 4.26e-01 6.08e-03 4.93e-03 1.69e-01 1.58e-01 5.9e-06 5100 4.31e-01 3.86e-01 3.81e-03 3.99e-04 1.68e-01 1.53e-01 5.3e-06 5200 4.24e-01 3.83e-01 4.62e-03 6.13e-03 1.72e-01 1.50e-01 4.8e-06 5300 4.07e-01 3.95e-01 4.31e-03 2.33e-03 1.72e-01 1.69e-01 4.4e-06 5400 3.73e-01 3.76e-01 1.07e-03 8.62e-04 1.68e-01 1.69e-01 3.9e-06 5500 4.02e-01 3.76e-01 8.93e-04 3.68e-05 1.88e-01 1.76e-01 3.5e-06 5600 3.47e-01 3.11e-01 9.55e-04 3.02e-03 1.69e-01 1.49e-01 3.2e-06 5700 3.15e-01 3.24e-01 2.12e-03 2.85e-03 1.58e-01 1.62e-01 2.9e-06 5800 3.35e-01 3.10e-01 3.21e-03 7.53e-04 1.73e-01 1.63e-01 2.6e-06 5900 3.01e-01 3.15e-01 9.53e-04 4.14e-03 1.64e-01 1.66e-01 2.4e-06 6000 2.71e-01 2.99e-01 2.64e-03 2.72e-03 1.50e-01 1.67e-01 2.1e-06 6100 2.81e-01 2.81e-01 3.77e-03 1.13e-03 1.59e-01 1.64e-01 1.9e-06 6200 2.87e-01 3.00e-01 1.26e-03 3.52e-03 1.73e-01 1.77e-01 1.7e-06 6300 2.85e-01 2.93e-01 2.24e-03 4.82e-03 1.76e-01 1.73e-01 1.6e-06 6400 2.90e-01 2.91e-01 4.72e-03 7.79e-03 1.76e-01 1.59e-01 1.4e-06 6500 2.51e-01 2.44e-01 3.29e-03 1.25e-03 1.60e-01 1.61e-01 1.3e-06 6600 2.50e-01 2.66e-01 9.02e-04 2.90e-03 1.70e-01 1.77e-01 1.1e-06 6700 2.51e-01 2.51e-01 3.05e-03 4.06e-03 1.71e-01 1.66e-01 1.0e-06 6800 2.46e-01 2.66e-01 2.51e-03 5.14e-03 1.73e-01 1.76e-01 9.3e-07 6900 2.37e-01 2.46e-01 2.11e-03 2.71e-03 1.72e-01 1.77e-01 8.4e-07 7000 2.12e-01 2.04e-01 2.36e-03 2.76e-03 1.56e-01 1.48e-01 7.6e-07 7100 2.10e-01 2.22e-01 8.80e-04 6.74e-04 1.61e-01 1.70e-01 6.9e-07 7200 2.10e-01 2.03e-01 3.29e-03 9.16e-04 1.56e-01 1.59e-01 6.2e-07 7300 2.02e-01 2.15e-01 1.87e-03 3.52e-03 1.59e-01 1.62e-01 5.6e-07 7400 1.97e-01 2.23e-01 8.85e-04 1.24e-03 1.60e-01 1.81e-01 5.1e-07 7500 2.11e-01 2.37e-01 2.15e-03 3.62e-03 1.71e-01 1.87e-01 4.6e-07 7600 2.04e-01 2.15e-01 2.25e-05 1.93e-03 1.72e-01 1.78e-01 4.1e-07 7700 1.95e-01 1.74e-01 1.09e-03 1.12e-03 1.66e-01 1.47e-01 3.7e-07 7800 1.83e-01 2.07e-01 1.04e-03 9.41e-04 1.58e-01 1.78e-01 3.4e-07 7900 1.89e-01 1.68e-01 2.24e-03 9.43e-04 1.61e-01 1.46e-01 3.0e-07 8000 1.84e-01 2.24e-01 1.60e-04 8.66e-03 1.63e-01 1.26e-01 2.7e-07 8100 1.96e-01 1.95e-01 2.31e-03 2.59e-03 1.71e-01 1.69e-01 2.5e-07 8200 2.24e-01 1.92e-01 6.95e-03 1.39e-04 1.58e-01 1.74e-01 2.2e-07 8300 1.90e-01 1.84e-01 2.67e-04 2.18e-03 1.73e-01 1.63e-01 2.0e-07 8400 1.83e-01 1.80e-01 2.83e-03 2.27e-04 1.60e-01 1.66e-01 1.8e-07 8500 1.80e-01 1.92e-01 1.73e-03 1.63e-03 1.64e-01 1.76e-01 1.6e-07 8600 2.01e-01 1.90e-01 2.55e-03 2.38e-04 1.82e-01 1.77e-01 1.5e-07 8700 1.91e-01 1.85e-01 1.81e-03 1.43e-03 1.76e-01 1.71e-01 1.3e-07 8800 1.98e-01 1.93e-01 4.48e-03 3.83e-03 1.67e-01 1.67e-01 1.2e-07 8900 1.85e-01 1.68e-01 3.47e-03 4.47e-04 1.63e-01 1.59e-01 1.1e-07 9000 1.75e-01 1.71e-01 2.19e-03 4.16e-04 1.61e-01 1.63e-01 9.8e-08 9100 1.72e-01 2.09e-01 2.31e-03 3.30e-03 1.59e-01 1.90e-01 8.8e-08 9200 1.80e-01 1.74e-01 4.00e-03 6.04e-04 1.55e-01 1.67e-01 8.0e-08 9300 1.93e-01 1.68e-01 2.28e-03 2.28e-04 1.81e-01 1.62e-01 7.2e-08 9400 2.03e-01 2.05e-01 3.86e-03 3.31e-03 1.82e-01 1.88e-01 6.5e-08 9500 1.93e-01 1.67e-01 4.89e-03 3.26e-04 1.62e-01 1.62e-01 5.9e-08 9600 1.79e-01 1.64e-01 5.69e-04 1.88e-03 1.74e-01 1.56e-01 5.3e-08 9700 1.78e-01 1.47e-01 2.07e-03 5.87e-04 1.69e-01 1.43e-01 4.8e-08 9800 1.71e-01 2.08e-01 1.80e-03 4.14e-03 1.64e-01 1.86e-01 4.3e-08 9900 1.83e-01 1.80e-01 4.06e-03 3.08e-03 1.60e-01 1.66e-01 3.9e-08 10000 1.65e-01 1.85e-01 2.19e-04 2.79e-03 1.62e-01 1.73e-01 3.5e-08
想要直观对比两个模型的lcurve差异?一键运行为大家准备好的可视化脚本试试~
根据lcurve的变化对比,我们可以看出基于预训练展开的微调模型训练初始具有更低的能量和力损失
让我们计算预测数据和原始数据之间的相关性并可视化查看一下。
WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : iter.000001/02.fp/data.000 DEEPMD INFO Adjust batch size from 1024 to 2048 DEEPMD INFO Adjust batch size from 2048 to 4096 DEEPMD INFO Adjust batch size from 4096 to 8192 2023-10-27 12:08:49.148903: W tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 219.73MiB (rounded to 230400000)requested by op load/attention_layer_1/Softmax If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. Current allocation summary follows. Current allocation summary follows. 2023-10-27 12:08:49.149185: W tensorflow/core/common_runtime/bfc_allocator.cc:491] ******************************************************************************__*************_****** 2023-10-27 12:08:49.149229: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at softmax_op_gpu.cu.cc:219 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[8000,60,60] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc DEEPMD INFO Adjust batch size from 8192 to 4096 DEEPMD INFO # number of test data : 100 DEEPMD INFO Energy MAE : 7.996145e-01 eV DEEPMD INFO Energy RMSE : 9.971957e-01 eV DEEPMD INFO Energy MAE/Natoms : 1.999036e-03 eV DEEPMD INFO Energy RMSE/Natoms : 2.492989e-03 eV DEEPMD INFO Force MAE : 1.226169e-01 eV/A DEEPMD INFO Force RMSE : 1.660002e-01 eV/A DEEPMD INFO Virial MAE : 3.813648e+00 eV DEEPMD INFO Virial RMSE : 4.946080e+00 eV DEEPMD INFO Virial MAE/Natoms : 9.534121e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.236520e-02 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : iter.000001/02.fp/data.001 DEEPMD INFO # number of test data : 100 DEEPMD INFO Energy MAE : 6.442529e-01 eV DEEPMD INFO Energy RMSE : 8.370321e-01 eV DEEPMD INFO Energy MAE/Natoms : 1.610632e-03 eV DEEPMD INFO Energy RMSE/Natoms : 2.092580e-03 eV DEEPMD INFO Force MAE : 1.229773e-01 eV/A DEEPMD INFO Force RMSE : 1.667417e-01 eV/A DEEPMD INFO Virial MAE : 3.252968e+00 eV DEEPMD INFO Virial RMSE : 4.280641e+00 eV DEEPMD INFO Virial MAE/Natoms : 8.132419e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.070160e-02 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : iter.000001/02.fp/data.002 DEEPMD INFO # number of test data : 100 DEEPMD INFO Energy MAE : 8.999357e-01 eV DEEPMD INFO Energy RMSE : 1.141360e+00 eV DEEPMD INFO Energy MAE/Natoms : 2.249839e-03 eV DEEPMD INFO Energy RMSE/Natoms : 2.853401e-03 eV DEEPMD INFO Force MAE : 1.218744e-01 eV/A DEEPMD INFO Force RMSE : 1.648103e-01 eV/A DEEPMD INFO Virial MAE : 3.209368e+00 eV DEEPMD INFO Virial RMSE : 4.163454e+00 eV DEEPMD INFO Virial MAE/Natoms : 8.023421e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.040863e-02 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : iter.000001/02.fp/data.003 DEEPMD INFO # number of test data : 100 DEEPMD INFO Energy MAE : 1.078860e+00 eV DEEPMD INFO Energy RMSE : 1.302633e+00 eV DEEPMD INFO Energy MAE/Natoms : 2.697149e-03 eV DEEPMD INFO Energy RMSE/Natoms : 3.256583e-03 eV DEEPMD INFO Force MAE : 1.226273e-01 eV/A DEEPMD INFO Force RMSE : 1.656819e-01 eV/A DEEPMD INFO Virial MAE : 3.309047e+00 eV DEEPMD INFO Virial RMSE : 4.277445e+00 eV DEEPMD INFO Virial MAE/Natoms : 8.272619e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.069361e-02 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ----------weighted average of errors----------- DEEPMD INFO # number of systems : 4 DEEPMD INFO Energy MAE : 8.556657e-01 eV DEEPMD INFO Energy RMSE : 1.083349e+00 eV DEEPMD INFO Energy MAE/Natoms : 2.139164e-03 eV DEEPMD INFO Energy RMSE/Natoms : 2.708372e-03 eV DEEPMD INFO Force MAE : 1.225240e-01 eV/A DEEPMD INFO Force RMSE : 1.658100e-01 eV/A DEEPMD INFO Virial MAE : 3.396258e+00 eV DEEPMD INFO Virial RMSE : 4.427710e+00 eV DEEPMD INFO Virial MAE/Natoms : 8.490645e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.106928e-02 eV DEEPMD INFO # ----------------------------------------------- WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.2.1/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. WARNING:tensorflow:From /opt/deepmd-kit-2.2.1/lib/python3.10/site-packages/deepmd/utils/batch_size.py:61: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : iter.000001/02.fp/data.000 DEEPMD INFO Adjust batch size from 1024 to 2048 DEEPMD INFO Adjust batch size from 2048 to 4096 DEEPMD INFO Adjust batch size from 4096 to 8192 2023-10-27 12:09:30.317555: W tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 219.73MiB (rounded to 230400000)requested by op load/attention_layer_1/Softmax If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. Current allocation summary follows. Current allocation summary follows. 2023-10-27 12:09:30.317826: W tensorflow/core/common_runtime/bfc_allocator.cc:491] ******************************************************************************__*************_****** 2023-10-27 12:09:30.317867: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at softmax_op_gpu.cu.cc:219 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[8000,60,60] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc DEEPMD INFO Adjust batch size from 8192 to 4096 DEEPMD INFO # number of test data : 100 DEEPMD INFO Energy MAE : 7.743095e-01 eV DEEPMD INFO Energy RMSE : 9.659079e-01 eV DEEPMD INFO Energy MAE/Natoms : 1.935774e-03 eV DEEPMD INFO Energy RMSE/Natoms : 2.414770e-03 eV DEEPMD INFO Force MAE : 1.110320e-01 eV/A DEEPMD INFO Force RMSE : 1.505050e-01 eV/A DEEPMD INFO Virial MAE : 4.409566e+00 eV DEEPMD INFO Virial RMSE : 5.957651e+00 eV DEEPMD INFO Virial MAE/Natoms : 1.102392e-02 eV DEEPMD INFO Virial RMSE/Natoms : 1.489413e-02 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : iter.000001/02.fp/data.001 DEEPMD INFO # number of test data : 100 DEEPMD INFO Energy MAE : 5.972980e-01 eV DEEPMD INFO Energy RMSE : 7.805804e-01 eV DEEPMD INFO Energy MAE/Natoms : 1.493245e-03 eV DEEPMD INFO Energy RMSE/Natoms : 1.951451e-03 eV DEEPMD INFO Force MAE : 1.110344e-01 eV/A DEEPMD INFO Force RMSE : 1.505657e-01 eV/A DEEPMD INFO Virial MAE : 3.763825e+00 eV DEEPMD INFO Virial RMSE : 5.118495e+00 eV DEEPMD INFO Virial MAE/Natoms : 9.409562e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.279624e-02 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : iter.000001/02.fp/data.002 DEEPMD INFO # number of test data : 100 DEEPMD INFO Energy MAE : 7.238635e-01 eV DEEPMD INFO Energy RMSE : 9.075177e-01 eV DEEPMD INFO Energy MAE/Natoms : 1.809659e-03 eV DEEPMD INFO Energy RMSE/Natoms : 2.268794e-03 eV DEEPMD INFO Force MAE : 1.101074e-01 eV/A DEEPMD INFO Force RMSE : 1.490124e-01 eV/A DEEPMD INFO Virial MAE : 3.584823e+00 eV DEEPMD INFO Virial RMSE : 4.830293e+00 eV DEEPMD INFO Virial MAE/Natoms : 8.962059e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.207573e-02 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : iter.000001/02.fp/data.003 DEEPMD INFO # number of test data : 100 DEEPMD INFO Energy MAE : 8.630818e-01 eV DEEPMD INFO Energy RMSE : 1.055384e+00 eV DEEPMD INFO Energy MAE/Natoms : 2.157704e-03 eV DEEPMD INFO Energy RMSE/Natoms : 2.638460e-03 eV DEEPMD INFO Force MAE : 1.106087e-01 eV/A DEEPMD INFO Force RMSE : 1.494560e-01 eV/A DEEPMD INFO Virial MAE : 3.666936e+00 eV DEEPMD INFO Virial RMSE : 4.851710e+00 eV DEEPMD INFO Virial MAE/Natoms : 9.167341e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.212928e-02 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ----------weighted average of errors----------- DEEPMD INFO # number of systems : 4 DEEPMD INFO Energy MAE : 7.396382e-01 eV DEEPMD INFO Energy RMSE : 9.326987e-01 eV DEEPMD INFO Energy MAE/Natoms : 1.849096e-03 eV DEEPMD INFO Energy RMSE/Natoms : 2.331747e-03 eV DEEPMD INFO Force MAE : 1.106956e-01 eV/A DEEPMD INFO Force RMSE : 1.498863e-01 eV/A DEEPMD INFO Virial MAE : 3.856288e+00 eV DEEPMD INFO Virial RMSE : 5.209688e+00 eV DEEPMD INFO Virial MAE/Natoms : 9.640719e-03 eV DEEPMD INFO Virial RMSE/Natoms : 1.302422e-02 eV DEEPMD INFO # -----------------------------------------------
同样地,我们还是借助可视化的方式查看模型的准确度
从Energy RMSE/Natoms和Force RMSE对比结果看,同样的条件下基于预训练微调后的训练的finetune模型精度更高。
这天,导师对你的模型训练策略赞赏有加👍。
这时,你不禁又思考了一个问题:模型已经训练完成,我该如何应用已有模型进行分子动力学模拟呢?
接下来将以J. Chem. Phys. 154, 094703 (2021)`这篇于2021年发表的使用深度势能进行固态电解质研究工作,展开机器学习如何赋能于分子动力学研究。
4. 固态电解质实战: DeePMD性质计算
4.1 深度势能分子动力学模拟
那接下来我们将以电解质 Li10GeP2S12为例,进行 LAMMPS 深度势能分子动力学模拟。
/data/study_examples/DeePMD_SSE ├── 1000K │ ├── data.lmp │ └── input.lammps ├── 400K │ ├── data.lmp │ ├── input.lammps │ └── model.pb ├── input_dpa.json └── job_dpa.json 2 directories, 7 files
input.lammps
: LAMMPS 输入文件,用于控制 LAMMPS MD 模拟的细节;data.lmp
: 用于存放 MD 模拟的初始构型;model.pb
: 深度势能模型。
(关于 LAMMPS 分子动力学模拟输入文件的更多信息,可以阅读「超详细「深度势能」材料计算上手指南|章节 1」
其中只有两行例外:
pair_style deepmd model.pb
pair_coeff * *
其中调用了 DeePMD 的 pair_style,提供了模型文件 model.pb
,这意味着原子间相互作用将由存储在文件model.pb
中的 DP 模型进行计算。
在具有兼容版本的 LAMMPS 的环境中,可以通过以下命令执行深度势分子动力学模拟: 我们本次指南就到这里结束啦。怎么样,是不是对DPA-1进行预训练模型已经跃跃欲试了?尝试学以致用在自己的课题上吧~
Warning: This LAMMPS executable is in a conda environment, but the environment has not been activated. Libraries may fail to load. To activate this environment please see https://conda.io/activation. LAMMPS (23 Jun 2022 - Update 1) OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98) using 1 OpenMP thread(s) per MPI task Loaded 1 plugins from /opt/deepmd-kit-2.2.1/lib/deepmd_lmp Reading data file ... orthogonal box = (0 0 0) to (26.67021 26.67021 25.5908) 1 by 1 by 1 MPI processor grid reading atoms ... 900 atoms read_data CPU = 0.006 seconds 360 atoms in group Li 36 atoms in group Ge 72 atoms in group P 432 atoms in group S DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. Summary of lammps deepmd module ... >>> Info of deepmd-kit: installed to: /opt/deepmd-kit-2.2.1 source: v2.2.1 source branch: HEAD source commit: 3ac8c4c7 source commit at: 2023-03-16 12:33:24 +0800 surpport model ver.:1.1 build variant: cuda build with tf inc: /opt/deepmd-kit-2.2.1/include;/opt/deepmd-kit-2.2.1/include build with tf lib: /opt/deepmd-kit-2.2.1/lib/libtensorflow_cc.so set tf intra_op_parallelism_threads: 0 set tf inter_op_parallelism_threads: 0 >>> Info of lammps module: use deepmd-kit at: /opt/deepmd-kit-2.2.1DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information. DeePMD-kit: Successfully load libcudart.so 2023-10-27 12:10:15.498962: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-10-27 12:10:15.500931: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-10-27 12:10:15.546981: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-10-27 12:10:15.547218: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-10-27 12:10:16.133692: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-10-27 12:10:16.133909: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-10-27 12:10:16.134098: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-10-27 12:10:16.134267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9910 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:00:09.0, compute capability: 7.5 2023-10-27 12:10:16.134642: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. 2023-10-27 12:10:16.190251: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled >>> Info of model(s): using 1 model(s): model.pb rcut in model: 6 ntypes in model: 4 using compute id: CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Your simulation uses code contributions which should be cited: - USER-DEEPMD package: The log file lists these citations in BibTeX format. CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE Generated 0 of 6 mixed pair_coeff terms from geometric mixing rule Neighbor list info ... update every 10 steps, delay 0 steps, check no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 7 ghost atom cutoff = 7 binsize = 3.5, bins = 8 8 8 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair deepmd, perpetual attributes: full, newton on pair build: full/bin/atomonly stencil: full/bin/3d bin: standard Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.002 Per MPI rank memory allocation (min/avg/max) = 2.64 | 2.64 | 2.64 Mbytes Step Temp E_pair E_mol TotEng Press 0 400 -3890.7484 0 -3844.2665 -404.48187 2000 415.00215 -3848.8776 0 -3800.6524 1642.3776 Loop time of 88.8733 on 1 procs for 2000 steps with 900 atoms Performance: 3.889 ns/day, 6.172 hours/ns, 22.504 timesteps/s 69.1% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 88.088 | 88.088 | 88.088 | 0.0 | 99.12 Neigh | 0.69156 | 0.69156 | 0.69156 | 0.0 | 0.78 Comm | 0.041243 | 0.041243 | 0.041243 | 0.0 | 0.05 Output | 4.6454e-05 | 4.6454e-05 | 4.6454e-05 | 0.0 | 0.00 Modify | 0.045213 | 0.045213 | 0.045213 | 0.0 | 0.05 Other | | 0.006775 | | | 0.01 Nlocal: 900 ave 900 max 900 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 2352 ave 2352 max 2352 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 62508 ave 62508 max 62508 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 62508 Ave neighs/atom = 69.453333 Neighbor list builds = 200 Dangerous builds not checked Generated 0 of 6 mixed pair_coeff terms from geometric mixing rule Neighbor list info ... update every 10 steps, delay 0 steps, check no max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 7 ghost atom cutoff = 7 binsize = 3.5, bins = 8 8 8 2 neighbor lists, perpetual/occasional/extra = 1 1 0 (1) pair deepmd, perpetual attributes: full, newton on pair build: full/bin/atomonly stencil: full/bin/3d bin: standard (2) compute rdf, occasional, half/full from (1) attributes: half, newton on pair build: halffull/newton stencil: none bin: none Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.002 Per MPI rank memory allocation (min/avg/max) = 6.056 | 6.056 | 6.056 Mbytes Step Time Temp KinEng TotEng Press 0 0 415.00215 48.22527 -3800.6524 1642.3776 100 0.2 396.73917 46.103024 -3802.0653 2452.8076 Loop time of 4.46563 on 1 procs for 100 steps with 900 atoms Performance: 3.870 ns/day, 6.202 hours/ns, 22.393 timesteps/s 68.6% CPU use with 1 MPI tasks x 1 OpenMP threads MPI task timing breakdown: Section | min time | avg time | max time |%varavg| %total --------------------------------------------------------------- Pair | 4.4222 | 4.4222 | 4.4222 | 0.0 | 99.03 Neigh | 0.034367 | 0.034367 | 0.034367 | 0.0 | 0.77 Comm | 0.0020063 | 0.0020063 | 0.0020063 | 0.0 | 0.04 Output | 0.0017228 | 0.0017228 | 0.0017228 | 0.0 | 0.04 Modify | 0.0050368 | 0.0050368 | 0.0050368 | 0.0 | 0.11 Other | | 0.0003258 | | | 0.01 Nlocal: 900 ave 900 max 900 min Histogram: 1 0 0 0 0 0 0 0 0 0 Nghost: 2378 ave 2378 max 2378 min Histogram: 1 0 0 0 0 0 0 0 0 0 Neighs: 31335 ave 31335 max 31335 min Histogram: 1 0 0 0 0 0 0 0 0 0 FullNghs: 62670 ave 62670 max 62670 min Histogram: 1 0 0 0 0 0 0 0 0 0 Total # of neighbors = 62670 Ave neighs/atom = 69.633333 Neighbor list builds = 10 Dangerous builds not checked Total wall time: 0:01:36
运行结束后,我们会得到如下输出文件:
dump.lammpstraj
: LAMMPS 输出轨迹文件;log.lammps
: LAMMPS 输出日志文件;target.msd
: 径向分布函数(RDF)文件,记录了每 Nfrequency 步数输出的 RDF;target.rdf
: 均方位移(MSD)文件,记录了模拟系统中离子的均方位移随时间的变化。
到这里,我们已经完成了在400 K下的深度势能动力学模拟;
本教程中的案例包/data/study_examples/DeePMD_SSE_done文件夹下已给出了相应的输出文件,计算流程同上,只是修改了时间步数。
4.1径向分布函数计算
下面,我们继续使用 Python 脚本进行对分子动力学模拟的结果进行分析,计算我们实际关心的物理性质。
我们可以与文献结果中进行对比
4.2 扩散系数计算
以及从均方位移导出的扩散系数的数值:
temperature: 400K
Diffusion Coefficients of Li+: 1.255409630651049e-10 m^2/s
Diffusion Coefficients of Ge4+: 5.933157939101913e-14 m^2/s
Diffusion Coefficients of P5+: 8.503725727827082e-14 m^2/s
Diffusion Coefficients of S2-: 4.7863835675262014e-14 m^2/s
temperature: 1000K
Diffusion Coefficients of Li+: 4.387296831245876e-09 m^2/s
Diffusion Coefficients of Ge4+: 1.5165098488282157e-13 m^2/s
Diffusion Coefficients of P5+: 1.2384106130369005e-13 m^2/s
Diffusion Coefficients of S2-: 6.3535584734912696e-12 m^2/s
对比可以看出:体系的Li+扩散速率最快,占据主导作用;且温度高时,扩散系数明显增大
文献中的计算结果见下,可以看出在在1000K下体系的扩散系数约为 ($\texttt{m^2/s}$), 在400K下体系的扩散系数约为 ($\texttt{m^2/s}$)。与我们的计算结果非常接近。
5. 参考来源
DPA-1相关:
- DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation
- DeePMD-kit’s documentation
- 快速上手深度势能预训练模型 DPA-1
- DPA-1: 共建覆盖元素周期表的预训练大模型
- 张铎:DPA-1预训练模型介绍&上机实践
推荐阅读DPMD系列Notebook:
参考文献
- Han Wang, Linfeng Zhang, Jiequn Han, and Weinan E. DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics. Comput. Phys. Comm., 228:178–184, 2018. doi:10.1016/j.cpc.2018.03.016.
- Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, and Han Wang. DeePMD-kit v2: A software package for Deep Potential models. 2023. doi:10.48550/arXiv.2304.09409.
- Huang J, Zhang L, Wang H, Zhao J, Cheng J, E W. Deep potential generation scheme and simulation protocol for the Li10GeP2S12-type superionic conductors. J Chem Phys. 2021;154(9):094703. doi:10.1063/5.0041849
- https://docs.deepmodeling.com/projects/deepmd/en/master/index.html
- https://github.com/deepmodeling/deepmd-kit
angelsu