Geometry-enhanced Pre-training on Interatomic Potentials

Taoyong Cui,Chenyu Tang,Mao Su,Shufei Zhang,Yuqiang Li,Lei Bai,Yuhan Dong,Xingao Gong,Wanli Ouyang
DOI: https://doi.org/10.1038/s42256-024-00818-6
2024-04-13
Abstract:Machine learning interatomic potentials (MLIPs) enables molecular dynamics (MD) simulations with ab initio accuracy and has been applied to various fields of physical science. However, the performance and transferability of MLIPs are limited by insufficient labeled training data, which require expensive ab initio calculations to obtain the labels, especially for complex molecular systems. To address this challenge, we design a novel geometric structure learning paradigm that consists of two stages. We first generate a large quantity of 3D configurations of target molecular system with classical molecular dynamics simulations. Then, we propose geometry-enhanced self-supervised learning consisting of masking, denoising, and contrastive learning to better capture the topology and 3D geometric information from the unlabeled 3D configurations. We evaluate our method on various benchmarks ranging from small molecule datasets to complex periodic molecular systems with more types of elements. The experimental results show that the proposed pre-training method can greatly enhance the accuracy of MLIPs with few extra computational costs and works well with different invariant or equivariant graph neural network architectures. Our method improves the generalization capability of MLIPs and helps to realize accurate MD simulations for complex molecular systems.
Chemical Physics,Computational Physics
What problem does this paper attempt to address?
The paper proposes a solution to the problem of machine learning interatomic potentials (MLIPs) relying on a large amount of expensive labeled data. By utilizing unlabeled configurations, they propose a geometric structure learning framework that consists of two stages: generating unlabeled configurations using classical molecular dynamics simulations and then applying enhanced geometry self-supervised learning techniques to capture structural information. This approach improves the accuracy and generalization ability of MLIPs while reducing additional computational costs, and is compatible with different graph neural network architectures.